Identification of recurrent regions of copy-number variants across multiple individuals

Open Access

22 March 2010

journal article
Published by Springer Science and Business Media LLC in BMC Bioinformatics

Vol. 11 (1), 147
https://doi.org/10.1186/1471-2105-11-147

Abstract

Background: Algorithms and software for CNV detection have been developed, but they detect the CNV regions sample-by-sample with individual-specific breakpoints, while common CNV regions are likely to occur at the same genomic locations across different individuals in a homogenous population. Current algorithms to detect common CNV regions do not account for the varying reliability of the individual CNVs, typically reported as confidence scores by SNP-based CNV detection algorithms. General methodologies for identifying these recurrent regions, especially those directed at SNP arrays, are still needed. Results: In this paper, we describe two new approaches for identifying common CNV regions based on (i) the frequency of occurrence of reliable CNVs, where reliability is determined by high confidence scores, and (ii) a weighted frequency of occurrence of CNVs, where the weights are determined by the confidence scores. In addition, motivated by the fact that we often observe partially overlapping CNV regions as a mixture of two or more distinct subregions, regions identified using the two approaches can be fine-tuned to smaller sub-regions using a clustering algorithm. We compared the performance of the methods with sequencing-based results in terms of discordance rates, rates of departure from Hardy-Weinberg equilibrium (HWE) and average frequency and size of the identified regions. The discordance rates as well as the rates of departure from HWE decrease when we select CNVs with higher confidence scores. We also performed comparisons with two previously published methods, STAC and GISTIC, and showed that the methods we consider are better at identifying low-frequency but high-confidence CNV regions. Conclusions: The proposed methods for identifying common CNV regions in multiple individuals perform well compared to existing methods. The identified common regions can be used for downstream analyses such as group comparisons in association studies.

Keywords

This publication has 21 references indexed in Scilit:

Assessing the significance of chromosomal aberrations in cancer: Methodology and application to glioma
Proceedings of the National Academy of Sciences of the United States of America, 2007
QuantiSNP: an Objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data
Nucleic Acids Research, 2007
Global variation in copy number in the human genome
Nature, 2006
Cross-platform array comparative genomic hybridization meta-analysis separates hematopoietic and mesenchymal from epithelial tumors
Oncogene, 2006
STAC: A method for testing the significance of DNA copy number aberrations across multiple array-CGH experiments
Genome Research, 2006
Linkage Disequilibrium and Heritability of Copy-Number Polymorphisms within Duplicated Regions of the Human Genome
American Journal of Human Genetics, 2006
Analysis of array CGH data: from signal ratio to gain and loss of DNA regions
Bioinformatics, 2004
Heterozygous germline mutations in BMPR2, encoding a TGF-β receptor, cause familial primary pulmonary hypertension
Nature Genetics, 2000
Cluster analysis and display of genome-wide expression patterns
Proceedings of the National Academy of Sciences of the United States of America, 1998
The presenilins and Alzheimer's disease
Human Molecular Genetics, 1997

Cited by 14 articles