Choosing Appropriate Substitution Models for the Phylogenetic Analysis of Protein-Coding Sequences
Top Cited Papers
Open Access
- 21 September 2005
- journal article
- research article
- Published by Oxford University Press (OUP) in Molecular Biology and Evolution
- Vol. 23 (1), 7-9
- https://doi.org/10.1093/molbev/msj021
Abstract
Although phylogenetic inference of protein-coding sequences continues to dominate the literature, few analyses incorporate evolutionary models that consider the genetic code. This problem is exacerbated by the exclusion of codon-based models from commonly employed model selection techniques, presumably due to the computational cost associated with codon models. We investigated an efficient alternative to standard nucleotide substitution models, in which codon position (CP) is incorporated into the model. We determined the most appropriate model for alignments of 177 RNA virus genes and 106 yeast genes, using 11 substitution models including one codon model and four CP models. The majority of analyzed gene alignments are best described by CP substitution models, rather than by standard nucleotide models, and without the computational cost of full codon models. These results have significant implications for phylogenetic inference of coding sequences as they make it clear that substitution models incorporating CPs not only are a computationally realistic alternative to standard models but may also frequently be statistically superior.Keywords
This publication has 19 references indexed in Scilit:
- Model Selection and Model Averaging in Phylogenetics: Advantages of Akaike Information Criterion and Bayesian Approaches Over Likelihood Ratio TestsSystematic Biology, 2004
- Genome-scale approaches to resolving incongruence in molecular phylogeniesNature, 2003
- MrBayes 3: Bayesian phylogenetic inference under mixed modelsBioinformatics, 2003
- The Effects of Nucleotide Substitution Model Assumptions on Estimates of Nonparametric Bootstrap SupportMolecular Biology and Evolution, 2002
- Maximum-likelihood models for combined analyses of multiple sequence dataJournal of Molecular Evolution, 1996
- Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methodsJournal of Molecular Evolution, 1994
- Statistical tests of models of DNA substitutionJournal of Molecular Evolution, 1993
- Dating of the human-ape splitting by a molecular clock of mitochondrial DNAJournal of Molecular Evolution, 1985
- A new method for calculating evolutionary substitution ratesJournal of Molecular Evolution, 1984
- Evolutionary trees from DNA sequences: A maximum likelihood approachJournal of Molecular Evolution, 1981