Linguistics of Nucleotide Sequences: Morphology and Comparison of Vocabularies

1 August 1986

journal article
research article
Published by Taylor & Francis Ltd in Journal of Biomolecular Structure and Dynamics

Vol. 4 (1), 11-21
https://doi.org/10.1080/07391102.1986.10507643

Abstract

The concept of “words” in continuous languages devoid of blanks is introduced and an operational definition of words given. With this novel concept nucleotide sequences become object for linguistic analysis. The typical word size of the nucleotide language is found to be 3 to 5 (tri-to pentamers). Different genomes have distinct vocabularies. Comparison of these vocabularies can serve as a basis for revealing functional and evolutionary relatedness of sequences.

Keywords

This publication has 12 references indexed in Scilit:

Recognition sequences of restriction endonucleases and methylases — a review
Gene, 1985
Distinguished words in data sequences: Analysis and applications to neural coding and other fields
Bulletin of Mathematical Biology, 1984
Genome structure described by formal languages
Nucleic Acids Research, 1984
Complete nucleotide sequence of bacteriophage T7 DNA and the locations of T7 genetic elements
Journal of Molecular Biology, 1983
A Vector for Introducing New Genes into Plants
Scientific American, 1983
Sequence-dependent Variations of B-DNA Structure and Protein-DNA Recognition
Cold Spring Harbor Symposia on Quantitative Biology, 1983
Nucleotide sequence of bacteriophage λ DNA
Journal of Molecular Biology, 1982
Nucleic Acid Sequence Database IV
DNA, 1982
The number of repeats expected in random nucleic acid sequences and found in genes
Journal of Theoretical Biology, 1981
On the distribution of the nucleotides in the seven completely sequenced DNAs
Gene, 1980

Cited by 122 articles