A general species delimitation method with applications to phylogenetic placements
Top Cited Papers
Open Access
- 29 August 2013
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 29 (22), 2869-2876
- https://doi.org/10.1093/bioinformatics/btt499
Abstract
Motivation: Sequence-based methods to delimit species are central to DNA taxonomy, microbial community surveys and DNA metabarcoding studies. Current approaches either rely on simple sequence similarity thresholds (OTU-picking) or on complex and compute-intensive evolutionary models. The OTU-picking methods scale well on large datasets, but the results are highly sensitive to the similarity threshold. Coalescent-based species delimitation approaches often rely on Bayesian statistics and Markov Chain Monte Carlo sampling, and can therefore only be applied to small datasets. Results: We introduce the Poisson tree processes (PTP) model to infer putative species boundaries on a given phylogenetic input tree. We also integrate PTP with our evolutionary placement algorithm (EPA-PTP) to count the number of species in phylogenetic placements. We compare our approaches with popular OTU-picking methods and the General Mixed Yule Coalescent (GMYC) model. For de novo species delimitation, the stand-alone PTP model generally outperforms GYMC as well as OTU-picking methods when evolutionary distances between species are small. PTP neither requires an ultrametric input tree nor a sequence similarity threshold as input. In the open reference species delimitation approach, EPA-PTP yields more accurate results than de novo species delimitation methods. Finally, EPA-PTP scales on large datasets because it relies on the parallel implementations of the EPA and RAxML, thereby allowing to delimit species in high-throughput sequencing data. Availability and implementation: The code is freely available at www.exelixis-lab.org/software.html. Contact:Alexandros.Stamatakis@h-its.org Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 54 references indexed in Scilit:
- CD-HIT: accelerated for clustering the next-generation sequencing dataBioinformatics, 2012
- Coalescent-based species delimitation in an integrative taxonomyTrends in Ecology & Evolution, 2012
- Sequencing our way towards understanding global eukaryotic biodiversityTrends in Ecology & Evolution, 2012
- A large-scale benchmark study of existing algorithms for taxonomy-independent microbial community analysisBriefings in Bioinformatics, 2011
- Clustering 16S rRNA for OTU prediction: a method of unsupervised Bayesian clusteringBioinformatics, 2011
- Divergence times and colonization of the Canary Islands by Gallotia lizardsMolecular Phylogenetics and Evolution, 2010
- Search and clustering orders of magnitude faster than BLASTBioinformatics, 2010
- Global patterns of 16S rRNA diversity at a depth of millions of sequences per sampleProceedings of the National Academy of Sciences of the United States of America, 2010
- Bayesian species delimitation using multilocus sequence dataProceedings of the National Academy of Sciences of the United States of America, 2010
- RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed modelsBioinformatics, 2006