On the statistical significance of nucleic add similarities

1 January 1984

journal article
research article
Published by Oxford University Press (OUP) in Nucleic Acids Research

Vol. 12 (1Part1), 215-226
https://doi.org/10.1093/nar/12.1part1.215

Abstract

When evaluating sequence similarities among nucleic acids by the usual methods, statistical significance is often found when the biological significance of the similarity is dubious. We demonstrate that the known statistical properties of nucleic acid sequences strongly affect the statistical distribution of similarity values when calculated by standard procedures. We propose a series of models which account for some of these known statistical properties. The utility of the method is demonstrated in evaluating high relative similarity scores in four specific cases in which there is little biological context by which to judge the similarities. In two of the cases we identify the statistical properties which are responsible for the apparent similarity. In the other two cases the statistical significance of the similarity persists even when the known statistical properties of sequences are modelled. For one of these cases biological significance is likely while the other case remains an enigma.

Keywords

This publication has 13 references indexed in Scilit:

Random sequences
Journal of Molecular Biology, 1983
Recognition of protein coding regions in DNA sequences
Nucleic Acids Research, 1982
Pattern recognition in nucleic acid sequences. I. A general method for finding local homologies and symmetries
Nucleic Acids Research, 1982
A + T-rich linkers define functional domains in eukaryotic DNA
Nature, 1982
Structure of the rat prolactin gene.
Journal of Biological Chemistry, 1980
Strong adenine clustering in nucleotide sequences
Journal of Theoretical Biology, 1980
Codon frequencies in 119 individual genes confirm corsistent choices of degenerate bases according to genome type
Nucleic Acids Research, 1980
Codon catalog usage and the genome hypothesis
Nucleic Acids Research, 1980
Some rules in the ordering of nucleotides in the DNA
Nucleic Acids Research, 1980
Computer analysis of nucleic acid regulatory sequences.
Proceedings of the National Academy of Sciences of the United States of America, 1977

Cited by 66 articles