hybridSPAdes: an algorithm for hybrid assembly of short and long reads
Open Access
- 20 November 2015
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 32 (7), 1009-1015
- https://doi.org/10.1093/bioinformatics/btv688
Abstract
Motivation: Recent advances in single molecule real-time (SMRT) and nanopore sequencing technologies have enabled high-quality assemblies from long and inaccurate reads. However, these approaches require high coverage by long reads and remain expensive. On the other hand, the inexpensive short reads technologies produce accurate but fragmented assemblies. Thus, a hybrid approach that assembles long reads (with low coverage) and short reads has a potential to generate high-quality assemblies at reduced cost. Results: We describe hybridSPAdes algorithm for assembling short and long reads and benchmark it on a variety of bacterial assembly projects. Our results demonstrate that hybridSPAdes generates accurate assemblies (even in projects with relatively low coverage by long reads) thus reducing the overall cost of genome sequencing. We further present the first complete assembly of a genome from single cells using SMRT reads. Availability and implementation:hybridSPAdes is implemented in C++ as a part of SPAdes genome assembler and is publicly available at http://bioinf.spbau.ru/en/spades Contact:d.antipov@spbu.ru Supplementary information: supplementary data are available at Bioinformatics online.Keywords
This publication has 34 references indexed in Scilit:
- Candidate phylum TM6 genome recovered from a hospital sink biofilm provides genomic insights into this uncultivated phylumProceedings of the National Academy of Sciences of the United States of America, 2013
- Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing dataNature Methods, 2013
- Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theoryBMC Bioinformatics, 2012
- Telescoper: de novo assembly of highly repetitive regionsBioinformatics, 2012
- Finished bacterial genomes from shotgun sequence dataGenome Research, 2012
- Hybrid error correction and de novo assembly of single-molecule sequencing readsNature Biotechnology, 2012
- SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell SequencingJournal of Computational Biology, 2012
- IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depthBioinformatics, 2012
- SomaticSniper: identification of somatic point mutations in whole genome sequencing dataBioinformatics, 2011
- Ray: Simultaneous Assembly of Reads from a Mix of High-Throughput Sequencing TechnologiesJournal of Computational Biology, 2010