ABySS: A parallel assembler for short read sequence data
Top Cited Papers
- 27 February 2009
- journal article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 19 (6), 1117-1123
- https://doi.org/10.1101/gr.089532.108
Abstract
Widespread adoption of massively parallel deoxyribonucleic acid (DNA) sequencing instruments has prompted the recent development of de novo short read assembly algorithms. A common shortcoming of the available tools is their inability to efficiently assemble vast amounts of data generated from large-scale sequencing projects, such as the sequencing of individual human genomes to catalog natural genetic variation. To address this limitation, we developed ABySS (Assembly By Short Sequences), a parallelized sequence assembler. As a demonstration of the capability of our software, we assembled 3.5 billion paired-end reads from the genome of an African male publicly released by Illumina, Inc. Approximately 2.76 million contigs ≥100 base pairs (bp) in length were created with an N50 size of 1499 bp, representing 68% of the reference human genome. Analysis of these contigs identified polymorphic and novel sequences not present in the human reference assembly, which were validated by alignment to alternate human assemblies and to other primate genomes.Keywords
This publication has 31 references indexed in Scilit:
- Mapping and sequencing of structural variation from eight human genomesNature, 2008
- Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencingNature Genetics, 2008
- The complete genome of an individual by massively parallel DNA sequencingNature, 2008
- Velvet: Algorithms for de novo short read assembly using de Bruijn graphsGenome Research, 2008
- ALLPATHS: De novo assembly of whole-genome shotgun microreadsGenome Research, 2008
- De novo bacterial genome sequencing: Millions of very short reads assembled on a desktop computerGenome Research, 2008
- The UCSC Genome Browser Database: 2008 updateNucleic Acids Research, 2007
- Short read fragment assembly of bacterial genomesGenome Research, 2007
- SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencingGenome Research, 2007
- The Diploid Genome Sequence of an Individual HumanPLoS Biology, 2007