Joint Bayesian Estimation of Alignment and Phylogeny
Open Access
- 1 June 2005
- journal article
- research article
- Published by Oxford University Press (OUP) in Systematic Biology
- Vol. 54 (3), 401-418
- https://doi.org/10.1080/10635150590947041
Abstract
We describe a novel model and algorithm for simultaneously estimating multiple molecular sequence alignments and the phylogenetic trees that relate the sequences. Unlike current techniques that base phylogeny estimates on a single estimate of the alignment, we take alignment uncertainty into account by considering all possible alignments. Furthermore, because the alignment and phylogeny are constructed simultaneously, a guide tree is not needed. This sidesteps the problem in which alignments created by progressive alignment are biased toward the guide tree used to generate them. Joint estimation also allows us to model rate variation between sites when estimating the alignment and to use the evidence in shared insertion/deletions (indels) to group sister taxa in the phylogeny. Our indel model makes use of affine gap penalties and considers indels of multiple letters. We make the simplifying assumption that the indel process is identical on all branches. As a result, the probability of a gap is independent of branch length. We use a Markov chain Monte Carlo (MCMC) method to sample from the posterior of the joint model, estimating the most probable alignment and tree and their support simultaneously. We describe a new MCMC transition kernel that improves our algorithm's mixing efficiency, allowing the MCMC chains to converge even when started from arbitrary alignments. Our software implementation can estimate alignment uncertainty and we describe a method for summarizing this uncertainty in a single plot.Keywords
This publication has 56 references indexed in Scilit:
- Unalignable sequences and molecular evolutionTrends in Ecology & Evolution, 2001
- T-coffee: a novel method for fast and accurate multiple sequence alignment 1 1Edited by J. ThorntonJournal of Molecular Biology, 2000
- Phylogenetic Tree Construction Using Markov Chain Monte CarloJournal of the American Statistical Association, 2000
- Fixed Character States and the Optimization of Molecular Sequence DataCladistics, 1999
- Among-site rate variation and its impact on phylogenetic analysesTrends in Ecology & Evolution, 1996
- Elision: A Method for Accommodating Multiple Molecular Sequence Alignments with Alignment-Ambiguous SitesMolecular Phylogenetics and Evolution, 1995
- Markov Chains for Exploring Posterior DistributionsThe Annals of Statistics, 1994
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994
- Alignment-Ambiguous Nucleotide Sites and the Exclusion of Systematic DataMolecular Phylogenetics and Evolution, 1993
- Evolutionary trees from DNA sequences: A maximum likelihood approachJournal of Molecular Evolution, 1981