Repetitive DNA and next-generation sequencing: computational challenges and solutions
Open Access
- 29 November 2011
- journal article
- review article
- Published by Springer Science and Business Media LLC in Nature Reviews Genetics
- Vol. 13 (1), 36-46
- https://doi.org/10.1038/nrg3117
Abstract
New high-throughput sequencing technologies have spurred explosive growth in the use of sequencing to discover mutations and structural variants in the human genome and in the number of projects to sequence and assemble new genomes. Highly efficient algorithms have been developed to align next-generation sequences to genomes, and these algorithms use a variety of strategies to place repetitive reads. Ambiguous mapping of sequences that are derived from repetitive regions makes it difficult to identify true polymorphisms and to reconstruct transcripts. Short read lengths combined with mapping ambiguities lead to false reports of single-nucleotide polymorphisms, inserts, deletions and other sequence variants. When assembling a genome de novo, repetitive sequences can lead to erroneous rearrangements, deletions, collapsed repeats and other assembly errors. Long-range linking information from paired-end reads can overcome some of the difficulties in short-read assembly.Keywords
This publication has 97 references indexed in Scilit:
- Genome sequence and analysis of the tuber crop potatoNature, 2011
- Full-length transcriptome assembly from RNA-Seq data without a reference genomeNature Biotechnology, 2011
- A framework for variation discovery and genotyping using next-generation DNA sequencing dataNature Genetics, 2011
- Integrative genomics viewerNature Biotechnology, 2011
- A map of human genome variation from population-scale sequencingNature, 2010
- Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiationNature Biotechnology, 2010
- Personalized copy number and segmental duplication maps using next-generation sequencingNature Genetics, 2009
- Turned on for degradation: ATPase-independent degradation by ClpPJournal of Structural Biology, 2009
- Highly Integrated Single-Base Resolution Maps of the Epigenome in ArabidopsisCell, 2008
- Bioinformatics challenges of new sequencing technologyTrends in Genetics, 2008