Pattern recognition in nucleic acid sequences. I. A general method for finding local homologies and symmetries

Abstract
We present an algorithm--a generalization of the Needleman-Wunsch-Sellers algorithm--which finds within longer sequences all subsequences that resemble one another locally. The probability that so close a resemblance would occur by chance alone is calculated and used to classify these local homologies according to statistical significance. Repeats and inverted repeats may also be found. Results for both random and biological nucleic acid sequences are presented. Fourteen complete genomes are analyzed for dyad symmetries.