What SNP genotyping errors are most costly for genetic association studies?
- 13 January 2004
- journal article
- research article
- Published by Wiley in Genetic Epidemiology
- Vol. 26 (2), 132-141
- https://doi.org/10.1002/gepi.10301
Abstract
Which genotype misclassification errors are most costly, in terms of increased sample size necessary (SSN) to maintain constant asymptotic power and significance level, when performing case/control studies of genetic association? We answer this question for single-nucleotide polymorphisms (SNPs), using the 2×3 χ2 test of independence. Our strategy is to expand the noncentrality parameter of the asymptotic distribution of the χ2 test under a specified alternative hypothesis to approximate SSN, using a linear Taylor series in the error parameters. We consider two scenarios: the first assumes Hardy-Weinberg equilibrium (HWE) for the true genotypes in both cases and controls, and the second assumes HWE only in controls. The Taylor series approximation has a relative error of less than 1% when each error rate is less than 2%. The most costly error is recording the more common homozygote as the less common homozygote, with indefinitely increasing cost coefficient as minor SNP allele frequencies approach 0 in both scenarios. The cost of misclassifying the more common homozygote to the heterozygote also becomes indefinitely large as the minor SNP allele frequency goes to 0 under both scenarios. For the violation of HWE modeled here, the cost of misclassifying a heterozygote to the less common homozygote becomes large, although bounded. Therefore, the use of SNPs with a small minor allele frequency requires careful attention to the frequency of genotyping errors to ensure that power specifications are met. Furthermore, the design of automated genotyping should minimize those errors whose cost coefficients can become indefinitely large. Genet Epidemiol 26:132–141, 2004.Keywords
This publication has 23 references indexed in Scilit:
- Probability of Detection of Genotyping Errors and Mutations as Inheritance Inconsistencies in Nuclear-Family DataAmerican Journal of Human Genetics, 2002
- Genetic Analysis of Case/Control Data Using Estimated Haplotype Frequencies: Application to APOE Locus Variation and Alzheimer's DiseaseGenome Research, 2001
- Identification and Analysis of Error Types in High-Throughput GenotypingAmerican Journal of Human Genetics, 2000
- Genetic polymorphism at theCLOCK gene locus and major depressionAmerican Journal of Medical Genetics, 2000
- A Multipoint Method for Detecting Genotyping Errors and Mutations in Sibling-Pair Linkage DataAmerican Journal of Human Genetics, 2000
- True Pedigree Errors More Frequent Than Apparent Errors for Single Nucleotide PolymorphismsHuman Heredity, 1999
- Population genetics—making sense out of sequenceNature Genetics, 1999
- Tests for Linear Trends in Proportions and FrequenciesBiometrics, 1955
- Misclassification in 2 X 2 TablesBiometrics, 1954
- The $\chi^2$ Test of Goodness of FitThe Annals of Mathematical Statistics, 1952