An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data
Open Access
- 13 January 2015
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 43 (7), e46
- https://doi.org/10.1093/nar/gkv002
Abstract
Next-generation sequencing (NGS) approaches rapidly produce millions to billions of short reads, which allow pathogen detection and discovery in human clinical, animal and environmental samples. A major limitation of sequence homology-based identification for highly divergent microorganisms is the short length of reads generated by most highly parallel sequencing technologies. Short reads require a high level of sequence similarities to annotated genes to confidently predict gene function or homology. Such recognition of highly divergent homologues can be improved by reference-free (de novo) assembly of short overlapping sequence reads into larger contigs. We describe an ensemble strategy that integrates the sequential use of various de Bruijn graph and overlap-layout-consensus assemblers with a novel partitioned sub-assembly approach. We also proposed new quality metrics that are suitable for evaluating metagenome de novo assembly. We demonstrate that this new ensemble strategy tested using in silico spike-in, clinical and environmental NGS datasets achieved significantly better contigs than current approaches.Keywords
This publication has 35 references indexed in Scilit:
- Comparison of different assembly and annotation tools on analysis of simulated viral metagenomic communities in the gutBMC Genomics, 2014
- The MaSuRCA genome assemblerBioinformatics, 2013
- SOAPdenovo2: an empirically improved memory-efficient short-read de novo assemblerGigaScience, 2012
- A Novel Rhabdovirus Associated with Acute Hemorrhagic Fever in Central AfricaPLoS Pathogens, 2012
- MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence readsNucleic Acids Research, 2012
- IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depthBioinformatics, 2012
- GAGE: A critical evaluation of genome assemblies and assembly algorithmsGenome Research, 2011
- Assemblathon 1: A competitive assessment of de novo short read assembly methodsGenome Research, 2011
- Velvet: Algorithms for de novo short read assembly using de Bruijn graphsGenome Research, 2008
- ALLPATHS: De novo assembly of whole-genome shotgun microreadsGenome Research, 2008