Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes
Top Cited Papers
- 31 July 2014
- journal article
- research article
- Published by Springer Science and Business Media LLC in Nature Biotechnology
- Vol. 32 (8), 822-828
- https://doi.org/10.1038/nbt.2939
Abstract
Most current approaches for analyzing metagenomic data rely on comparisons to reference genomes, but the microbial diversity of many environments extends far beyond what is covered by reference databases. De novo segregation of complex metagenomic data into specific biological entities, such as particular bacterial strains or viruses, remains a largely unsolved problem. Here we present a method, based on binning co-abundant genes across a series of metagenomic samples, that enables comprehensive discovery of new microbial organisms, viruses and co-inherited genetic entities and aids assembly of microbial genomes without the need for reference sequences. We demonstrate the method on data from 396 human gut microbiome samples and identify 7,381 co-abundance gene groups (CAGs), including 741 metagenomic species (MGS). We use these to assemble 238 high-quality microbial genomes and identify affiliations between MGS and hundreds of viruses or genetic entities. Our method provides the means for comprehensive profiling of the diversity within complex metagenomic samples.This publication has 51 references indexed in Scilit:
- CRISPR-Cas systems target a diverse collection of invasive mobile genetic elements in human microbiomesGenome Biology, 2013
- MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sampleBioinformatics, 2012
- The enemy within us: lessons from the 2011 European Escherichia coli O104:H4 outbreakEMBO Molecular Medicine, 2012
- CRISPR-based adaptive immune systemsCurrent Opinion in Microbiology, 2011
- Next Generation Sequence Assembly with AMOSCurrent Protocols in Bioinformatics, 2011
- Viruses in the faecal microbiota of monozygotic twins and their mothersNature, 2010
- Comparison of 61 Sequenced Escherichia coli GenomesMicrobial Ecology, 2010
- Genome assembly reborn: recent computational challengesBriefings in Bioinformatics, 2009
- Molecular eco-systems biology: towards an understanding of community functionNature Reviews Microbiology, 2008
- Use of simulated data sets to evaluate the fidelity of metagenomic processing methodsNature Methods, 2007