Abstract
The large number of tests performed in analyzing data from genome-wide association studies has a large impact on the power of detecting risk variants, and analytic strategies specifying the optimal set of hypotheses to be tested are necessary. We propose a genome-wide strategy that is based on one degree of freedom tests for all the genotyped variants, and for all the untyped variants for which there is sufficient information in the observed data. The set of untyped variants to be tested is found using multi-locus measures of linkage disequilibrium and haplotype frequencies from a reference database such as HapMap (The International HapMap Consortium [2003] Nature 426:789–796). We introduce a novel statistic for testing differences in allele frequencies for untyped variation that is based on linear combinations of estimable haplotype frequencies. Algorithms for finding the sets of genotyped markers to be used in testing an untyped allele, and ways of incorporating haplotypes observed in the study data but not in the reference database are also described. The proposed testing strategy can be used as the first step in the analysis of genome-wide association data, and, because every performed test is directed to a marker, it can be used to specify the set of polymorphisms to genotype in follow-up studies. The described methodology provides also a tool for joint analysis of data from studies done on different platforms. Genet. Epidemiol. 2006.