Genome structure described by formal languages

Abstract
Nucleic acid sequences may be looked upon as words over the alphabet of nucleotides. Naturally occurring DNAs and RNAs form subsets of the set of all possible words. The use of formal languages is proposed to describe the structure of these subsets. Regular languages defined by finite automata are introduced to demonstrate the application of the concept on RNA-phages of group I. This approach permits a concise characterization of grammatical patterns in genetic information.

This publication has 11 references indexed in Scilit: