Repetitive DNA and next-generation sequencing: computational challenges and solutions

Open Access

29 November 2011

journal article
review article
Published by Springer Science and Business Media LLC in Nature Reviews Genetics

Vol. 13 (1), 36-46
https://doi.org/10.1038/nrg3117

Abstract

New high-throughput sequencing technologies have spurred explosive growth in the use of sequencing to discover mutations and structural variants in the human genome and in the number of projects to sequence and assemble new genomes. Highly efficient algorithms have been developed to align next-generation sequences to genomes, and these algorithms use a variety of strategies to place repetitive reads. Ambiguous mapping of sequences that are derived from repetitive regions makes it difficult to identify true polymorphisms and to reconstruct transcripts. Short read lengths combined with mapping ambiguities lead to false reports of single-nucleotide polymorphisms, inserts, deletions and other sequence variants. When assembling a genome de novo, repetitive sequences can lead to erroneous rearrangements, deletions, collapsed repeats and other assembly errors. Long-range linking information from paired-end reads can overcome some of the difficulties in short-read assembly.

Keywords

This publication has 97 references indexed in Scilit:

Genome sequence and analysis of the tuber crop potato
Nature, 2011
Full-length transcriptome assembly from RNA-Seq data without a reference genome
Nature Biotechnology, 2011
A framework for variation discovery and genotyping using next-generation DNA sequencing data
Nature Genetics, 2011
Integrative genomics viewer
Nature Biotechnology, 2011
A map of human genome variation from population-scale sequencing
Nature, 2010
Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation
Nature Biotechnology, 2010
Personalized copy number and segmental duplication maps using next-generation sequencing
Nature Genetics, 2009
Turned on for degradation: ATPase-independent degradation by ClpP
Journal of Structural Biology, 2009
Highly Integrated Single-Base Resolution Maps of the Epigenome in Arabidopsis
Cell, 2008
Bioinformatics challenges of new sequencing technology
Trends in Genetics, 2008

Cited by 1422 articles