Markov chain analysis finds a significant influence of neighboring bases on the occurrence of a base in eucaryotic nuclear DNA sequences both protein-coding and noncoding
- 1 April 1985
- journal article
- research article
- Published by Springer Science and Business Media LLC in Journal of Molecular Evolution
- Vol. 21 (3), 278-288
- https://doi.org/10.1007/bf02102360
Abstract
Sixty-four eucaryotic nuclear DNA sequences, half of them coding and half noncoding, have been examined as expressions of first-, second-, or third-order Markov chains. Standard statistical tests found that most of the sequences required at least second-order Markov chains for their representation, and some required chains of third order. For all 64 sequences the observed one-step second-order transition count matrices were effective in predicting the two-step transition count matrices, and 56 of 64 were effective in predicting the three-step transition count matrices. The departure from random expectation of the observed first- and second-order transition count matrices meant that a considerable sample of eucaryotic nuclear DNA sequences, both protein coding and noncoding, have significant local structure over subsequences of three to five contiguous bases, and that this structure occurs throughout the total length of the sequence. These results suggested that present DNA sequences may have arisen from the duplication, concatenation, and gradual modification of very early short sequences.Keywords
This publication has 52 references indexed in Scilit:
- Choice of base at silent codon site 3 is not selectively neutral in eucaryotic structural genes: It maintains excess short runs of weak and strong hydrogen bonding basesJournal of Molecular Evolution, 1983
- Contextual constraints on synonymous codon choiceJournal of Molecular Biology, 1983
- Complete nucleotide sequence of the human δ-globin geneCell, 1980
- DNA methylation and the frequency of CpG in animal DNANucleic Acids Research, 1980
- Comparison of Total Sequence of a Cloned Rabbit β-Globin Gene and Its Flanking Regions with a Homologous Mouse SequenceScience, 1979
- The DNA sequence of sea urchin (S. purpuratus) H2A, H2B and H3 histone coding and spacer regionsCell, 1978
- Molecular basis of base substitution hotspots in Escherichia coliNature, 1978
- Codons and nearest-neighbor nucleotide pairs in mammalian messenger RNAJournal of Molecular Evolution, 1978
- Doublet frequencies in sequenced nucleic acidsJournal of Molecular Evolution, 1975
- The appearance of new structures and functions in proteins during evolutionJournal of Molecular Evolution, 1975