Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads
Top Cited Papers
Open Access
- 22 April 2014
- journal article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 24 (8), 1384-1395
- https://doi.org/10.1101/gr.170720.113
Abstract
Although many de novo genome assembly projects have recently been conducted using high-throughput sequencers, assembling highly heterozygous diploid genomes is a substantial challenge due to the increased complexity of the de Bruijn graph structure predominantly used. To address the increasing demand for sequencing of nonmodel and/or wild-type samples, in most cases inbred lines or fosmid-based hierarchical sequencing methods are used to overcome such problems. However, these methods are costly and time consuming, forfeiting the advantages of massive parallel sequencing. Here, we describe a novel de novo assembler, Platanus, that can effectively manage high-throughput data from heterozygous samples. Platanus assembles DNA fragments (reads) into contigs by constructing de Bruijn graphs with automatically optimized k-mer sizes followed by the scaffolding of contigs based on paired-end information. The complicated graph structures that result from the heterozygosity are simplified during not only the contig assembly step but also the scaffolding step. We evaluated the assembly results on eukaryotic samples with various levels of heterozygosity. Compared with other assemblers, Platanus yields assembly results that have a larger scaffold NG50 length without any accompanying loss of accuracy in both simulated and real data. In addition, Platanus recorded the largest scaffold NG50 values for two of the three low-heterozygosity species used in the de novo assembly contest, Assemblathon 2. Platanus therefore provides a novel and efficient approach for the assembly of gigabase-sized highly heterozygous genomes and is an attractive alternative to the existing assemblers designed for genomes of lower heterozygosity.Keywords
Funding Information
- Ministry of Education, Culture, Sports, Science and Technology of Japan (22125008, 24310142, 221S0002)
This publication has 33 references indexed in Scilit:
- High genome heterozygosity and endemic genetic recombination in the wheat stripe rust fungusNature Communications, 2013
- The oyster genome reveals stress adaptation and complexity of shell formationNature, 2012
- Fast gapped-read alignment with Bowtie 2Nature Methods, 2012
- Genome Sequencing and Analysis of the Tasmanian Devil and Its Transmissible CancerCell, 2012
- Genome sequencing reveals insights into physiology and longevity of the naked mole ratNature, 2011
- The genome sequence of Atlantic cod reveals a unique immune systemNature, 2011
- Genome sequence and analysis of the tuber crop potatoNature, 2011
- Full-length transcriptome assembly from RNA-Seq data without a reference genomeNature Biotechnology, 2011
- Extreme genomic variation in a natural populationProceedings of the National Academy of Sciences of the United States of America, 2007
- BLAT—The BLAST-Like Alignment ToolGenome Research, 2002