Searching in parallel for similar strings [biological sequences]
- 1 January 1994
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Computational Science and Engineering
- Vol. 1 (2), 60-75
- https://doi.org/10.1109/99.326666
Abstract
Distributed computation, probabilistic indexing and hashing techniques combine to create a novel approach to processing very large biological-sequence databases. Other data-intensive tasks could also benefit. Our indexing-based approach enables fast similarity searching through a large database of strings. Thanks to a redundant table-lookup scheme, recovering database items that match a test sequence requires minimal data access. We have implemented a uniprocessor version of this approach called Flash (Fast Lookup Algorithm for String Homology) as well as a distributed version, dFlash, using a cluster of seven non-dedicated workstations connected through a local area network. In this article, we present an approach for retrieving homologies in databases of proteins.Keywords
This publication has 10 references indexed in Scilit:
- Geometric Hashing: A General And Efficient Model-based Recognition SchemePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- High-level language support for programming distributed systemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Multidimensional indexing for recognizing visual shapesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- A Bayesian Approach to Model Matching with Geometric HashingComputer Vision and Image Understanding, 1995
- BLAZE™: An implementation of the Smith-Waterman sequence comparison algorithm on a massively parallel computerComputers & Chemistry, 1993
- An Improved Algorithm For Approximate String MatchingSIAM Journal on Computing, 1990
- Basic Local Alignment Search ToolJournal of Molecular Biology, 1990
- Improved tools for biological sequence comparison.Proceedings of the National Academy of Sciences, 1988
- Rapid and Sensitive Protein Similarity SearchesScience, 1985
- Rapid similarity searches of nucleic acid and protein data banks.Proceedings of the National Academy of Sciences, 1983