Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence
Open Access
- 1 April 2003
- journal article
- Published by Springer Science and Business Media LLC in Genome Biology
- Vol. 4 (4), R25
- https://doi.org/10.1186/gb-2003-4-4-r25
Abstract
Background: Previous studies have suggested that recent segmental duplications, which are often involved in chromosome rearrangements underlying genomic disease, account for some 5% of the human genome. We have developed rapid computational heuristics based on BLAST analysis to detect segmental duplications, as well as regions containing potential sequence misassignments in the human genome assemblies. Results: Our analysis of the June 2002 public human genome assembly revealed that 107.4 of 3,043.1 megabases (Mb) (3.53%) of sequence contained segmental duplications, each with size equal or more than 5 kb and 90% identity. We have also detected that 38.9 Mb (1.28%) of sequence within this assembly is likely to be involved in sequence misassignment errors. Furthermore, we have identified a significant subset (199,965 of 2,327,473 or 8.6%) of single-nucleotide polymorphisms (SNPs) in the public databases that are not true SNPs but are potential paralogous sequence variants. Conclusion: Using two distinct computational approaches, we have identified most of the sequences in the human genome that have undergone recent segmental duplications. Near-identical segmental duplications present a major challenge to the completion of the human genome sequence. Potential sequence misassignments detected in this study would require additional efforts to resolve.Keywords
This publication has 25 references indexed in Scilit:
- Chromosomal regions containing high-density and ambiguously mapped putative single nucleotide polymorphisms (SNPs) correlate with segmental duplications in the human genomeHuman Molecular Genetics, 2002
- Recent Segmental Duplications in the Human GenomeScience, 2002
- Heterozygous Submicroscopic Inversions Involving Olfactory Receptor–Gene Clusters Mediate the Recurrent t(4;8)(p16;p23) TranslocationAmerican Journal of Human Genetics, 2002
- A 76-kb duplicon maps close to the BCR gene on chromosome 22 and the ABL gene on chromosome 9: Possible involvement in the genesis of the Philadelphia chromosome translocationProceedings of the National Academy of Sciences of the United States of America, 2002
- Segmental Duplications: Organization and Impact Within the Current Human Genome Project AssemblyGenome Research, 2001
- Olfactory Receptor–Gene Clusters, Genomic-Inversion Polymorphisms, and Common Chromosome RearrangementsAmerican Journal of Human Genetics, 2001
- Initial sequencing and analysis of the human genomeNature, 2001
- Genomic disorders: structural features of the genome can lead to DNA rearrangements and human disease traitsTrends in Genetics, 1998
- A recombination hotspot responsible for two inherited peripheral neuropathies is located near a mariner transposon-like elementNature Genetics, 1996
- Basic local alignment search toolJournal of Molecular Biology, 1990