Effect of sequence depth and length in long-read assembly of the maize inbred NC358
Open Access
- 8 May 2020
- journal article
- research article
- Published by Springer Science and Business Media LLC in Nature Communications
- Vol. 11 (1), 1-10
- https://doi.org/10.1038/s41467-020-16037-7
Abstract
Improvements in long-read data and scaffolding technologies have enabled rapid generation of reference-quality assemblies for complex genomes. Still, an assessment of critical sequence depth and read length is important for allocating limited resources. To this end, we have generated eight assemblies for the complex genome of the maize inbred line NC358 using PacBio datasets ranging from 20 to 75 × genomic depth and with N50 subread lengths of 11–21 kb. Assemblies with ≤30 × depth and N50 subread length of 11 kb are highly fragmented, with even low-copy genic regions showing degradation at 20 × depth. Distinct sequence-quality thresholds are observed for complete assembly of genes, transposable elements, and highly repetitive genomic features such as telomeres, heterochromatic knobs, and centromeres. In addition, we show high-quality optical maps can dramatically improve contiguity in even our most fragmented base assembly. This study provides a useful resource allocation reference to the community as long-read technologies continue to mature.Keywords
Funding Information
- National Science Foundation (IOS-1744001, IOS-1546727)
- United States Department of Agriculture | Agricultural Research Service (5030-21000-068-00D, 58-8062-2100-044)
- U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (Intramural Research Program)
This publication has 63 references indexed in Scilit:
- Genome of the long-living sacred lotus (Nelumbo nucifera Gaertn.)Genome Biology, 2013
- STAR: ultrafast universal RNA-seq alignerBioinformatics, 2012
- A fast, lock-free approach for efficient parallel counting of occurrences of k-mersBioinformatics, 2011
- Limitations of next-generation genome sequence assemblyNature Methods, 2010
- BEDTools: a flexible suite of utilities for comparing genomic featuresBioinformatics, 2010
- The Sequence Alignment/Map format and SAMtoolsBioinformatics, 2009
- NCBI BLAST: a better web interfaceNucleic Acids Research, 2008
- The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phylaNature, 2007
- AUGUSTUS: ab initio prediction of alternative transcriptsNucleic Acids Research, 2006
- Initial sequencing and analysis of the human genomeNature, 2001