Computer analysis of nucleic acid regulatory sequences.

Abstract
We describe a computer program designed to facilitate the analysis of nucleic acid sequences. The program can search several nucleic acid sequences for oligonucleotides common to all of them. It can examine a DNA or RNA sequence for two kinds of homologous regions--repetitions and dyad symmetries. The homologies need not be perfect: mismatches and "looping out" of nucleotides are allowed. The program also finds (A+T)- and (G+C)-rich regions, locates restriction enzyme recognition sites, determines the distribution of di- and trinucleotides, and performs various other functions. We include two representative applications of the program. All published prokaryotic transcription termination sequences (June 1977) were found to share the following features: (i) a string of at least five T residues, (ii) the sequence CGGGC or a close analog immediately preceding the T cluster, (iii) a region of strong dyad symmetry preceding the Ts and including the CGGGC sequence. A sequence of 221 nucleotides consisting of the Escherichia coli trp promoter, operator, and leader was found to contain two strong dyad symmetries. These homologies both occur at known regulatory sites; no comparable homologies occur in regions without regulatory significance.