Accurate phylogenetic classification of variable-length DNA fragments
- 10 December 2006
- journal article
- research article
- Published by Springer Science and Business Media LLC in Nature Methods
- Vol. 4 (1), 63-72
- https://doi.org/10.1038/nmeth976
Abstract
Metagenome studies have retrieved vast amounts of sequence data from a variety of environments leading to new discoveries and insights into the uncultured microbial world. Except for very simple communities, the encountered diversity has made fragment assembly and the subsequent analysis a challenging problem. A taxonomic characterization of metagenomic fragments is required for a deeper understanding of shotgun-sequenced microbial communities, but success has mostly been limited to sequences containing phylogenetic marker genes. Here we present PhyloPythia, a composition-based classifier that combines higher-level generic clades from a set of 340 completed genomes with sample-derived population models. Extensive analyses on synthetic and real metagenome data sets showed that PhyloPythia allows the accurate classification of most sequence fragments across all considered taxonomic ranks, even for unknown organisms. The method requires no more than 100 kb of training sequence for the creation of accurate models of sample-specific populations and can assign fragments >or=1 kb with high specificity.Keywords
This publication has 33 references indexed in Scilit:
- Fluorescence In Situ Hybridization-Flow Cytometry-Cell Sorting-Based Method for Separation and Enrichment of Type I and Type II Methanotroph PopulationsApplied and Environmental Microbiology, 2006
- Sequencing genomes from single cells by polymerase cloningNature Biotechnology, 2006
- Toward Automatic Reconstruction of a Highly Resolved Tree of LifeScience, 2006
- Novel Phylogenetic Studies of Genomic Sequence Fragments Derived from Uncultured Microbe Mixtures in Environmental and Clinical SamplesDNA Research, 2005
- Application of tetranucleotide frequencies for the assignment of genomic fragmentsEnvironmental Microbiology, 2004
- Environmental Genome Shotgun Sequencing of the Sargasso SeaScience, 2004
- Community structure and metabolism through reconstruction of microbial genomes from the environmentNature, 2004
- Evolutionary Implications of Microbial Genome Tetranucleotide Frequency BiasesGenome Research, 2003
- Capturing Whole-Genome Characteristics in Short Sequences Using a Naïve Bayesian ClassifierGenome Research, 2001
- Dinucleotide relative abundance extremes: a genomic signatureTrends in Genetics, 1995