Searching in parallel for similar strings [biological sequences]

1 January 1994

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Computational Science and Engineering

Vol. 1 (2), 60-75
https://doi.org/10.1109/99.326666

Abstract

Distributed computation, probabilistic indexing and hashing techniques combine to create a novel approach to processing very large biological-sequence databases. Other data-intensive tasks could also benefit. Our indexing-based approach enables fast similarity searching through a large database of strings. Thanks to a redundant table-lookup scheme, recovering database items that match a test sequence requires minimal data access. We have implemented a uniprocessor version of this approach called Flash (Fast Lookup Algorithm for String Homology) as well as a distributed version, dFlash, using a cluster of seven non-dedicated workstations connected through a local area network. In this article, we present an approach for retrieving homologies in databases of proteins.

Keywords

This publication has 10 references indexed in Scilit:

Geometric Hashing: A General And Efficient Model-based Recognition Scheme
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
High-level language support for programming distributed systems
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Multidimensional indexing for recognizing visual shapes
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
A Bayesian Approach to Model Matching with Geometric Hashing
Computer Vision and Image Understanding, 1995
BLAZE™: An implementation of the Smith-Waterman sequence comparison algorithm on a massively parallel computer
Computers & Chemistry, 1993
An Improved Algorithm For Approximate String Matching
SIAM Journal on Computing, 1990
Basic Local Alignment Search Tool
Journal of Molecular Biology, 1990
Improved tools for biological sequence comparison.
Proceedings of the National Academy of Sciences, 1988
Rapid and Sensitive Protein Similarity Searches
Science, 1985
Rapid similarity searches of nucleic acid and protein data banks.
Proceedings of the National Academy of Sciences, 1983

Cited by 12 articles