Direct Inference of SNP Heterozygosity Rates and Resolution of LOH Detection

Abstract
Single nucleotide polymorphisms (SNPs) have been increasingly utilized to investigate somatic genetic abnormalities in premalignancy and cancer. LOH is a common alteration observed during cancer development, and SNP assays have been used to identify LOH at specific chromosomal regions. The design of such studies requires consideration of the resolution for detecting LOH throughout the genome and identification of the number and location of SNPs required to detect genetic alterations in specific genomic regions. Our study evaluated SNP distribution patterns and used probability models, Monte Carlo simulation, and real human subject genotype data to investigate the relationships between the number of SNPs, SNP HET rates, and the sensitivity (resolution) for detecting LOH. We report that variances of SNP heterozygosity rate in dbSNP are high for a large proportion of SNPs. Two statistical methods proposed for directly inferring SNP heterozygosity rates require much smaller sample sizes (intermediate sizes) and are feasible for practical use in SNP selection or verification. Using HapMap data, we showed that a region of LOH greater than 200 kb can be reliably detected, with losses smaller than 50 kb having a substantially lower detection probability when using all SNPs currently in the HapMap database. Higher densities of SNPs may exist in certain local chromosomal regions that provide some opportunities for reliably detecting LOH of segment sizes smaller than 50 kb. These results suggest that the interpretation of the results from genome-wide scans for LOH using commercial arrays need to consider the relationships among inter-SNP distance, detection probability, and sample size for a specific study. New experimental designs for LOH studies would also benefit from considering the power of detection and sample sizes required to accomplish the proposed aims. More than 99% of each person's genome is identical to everyone else's. Many of the differences involve single base pairs, termed single nucleotide polymorphisms (SNPs). SNPs are used as genetic markers to facilitate identification of disease-causing genes, as well as in cancer studies by aiding in determining which regions of the genome may be lost (LOH) or amplified during neoplastic progression. One drawback to SNPs is their low informativity: a SNP is only informative if it is polymorphic on the two different alleles found on each chromosome of a pair; and if there is not an informative SNP in the region of genome of interest, it is impossible to detect alterations occurring there through LOH. A common solution to this problem is to use arrays containing hundreds of thousands of SNPs to ensure adequate coverage, but for many studies this is prohibitive on a cost and sample amount basis. In addition, SNP distribution itself can constrain the size of loss that can be reliably detected at the population level. We examined the relationship between chromosome loss sizes and detection probability of LOH genome-wide. The study provides useful information for researchers designing LOH-related studies and evaluating results obtained from such studies.

This publication has 36 references indexed in Scilit: