Codon catalog usage and the genome hypothesis

Abstract
Frequencies for each of the 61 amino acid codes have been determined in every published mRNA sequence of 50 or more codons. The frequencies are shown for each kind of genome and for each individual gene. A surprising consistency of choices exists among genes of the same or similar genomes. Thus each genome, or kind of genome, appears to possess a “system” for choosing between codons. Frameshift genes, however, have widely different choice strategies from normal genes. Our work indicates that the main factors distinguishing between mRNA sequences relate to choices among degenerate bases. These systematic third base choices can therefore be used to establish a new kind of genetic distance, which reflects differences in coding strategy. The choice patterns we find seem compatible with the idea that the genome and not the individual gene is the unit of selection. Each gene in a genome tends to conform to its species' usage of the codon catalog; this is our genome hypothesis.