A sparse marker extension tree algorithm for selecting the best set of haplotype tagging single nucleotide polymorphisms

17 November 2005

journal article
research article
Published by Wiley in Genetic Epidemiology

Vol. 29 (4), 336-352
https://doi.org/10.1002/gepi.20095

Abstract

Single nucleotide polymorphisms (SNPs) play a central role in the identification of susceptibility genes for common diseases. Recent empirical studies on human genome have revealed block‐like structures, and each block contains a set of haplotype tagging SNPs (htSNPs) that capture a large fraction of the haplotype diversity. Herein, we present an innovative sparse marker extension tree (SMET) algorithm to select optimal htSNP set(s). SMET reduces the search space considerably (compared to full enumeration strategy), and therefore improves computing efficiency. We tested this algorithm on several datasets at three different genomic scales: (1) gene‐wide (NOS3, CRP, IL6 PPARA, and TNF), (2) region‐wide (a Whitehead Institute inflammatory bowel disease dataset and a UK Graves' disease dataset), and (3) chromosome‐wide (chromosome 22) levels. SMET offers geneticists with greater flexibilities in SNP tagging than lossless methods with adjustable haplotype diversity coverage (ϕ). In simulation studies, we found that (1) an initial sample size of 50 individuals (100 chromosomes) or more is needed for htSNP selection; (2) the SNP tagging strategy is considerably more efficient when the underlying block structure is taken into account; and (3) htSNP sets at 80–90% ϕ are more cost‐effective than the lossless sets in term of relative power, relative risk ratio estimation, and genotyping efforts. Our study suggests that the novel SMET algorithm is a valuable tool for association tests. Genet. Epidemiol. 29:336–352, 2005.

Keywords

This publication has 39 references indexed in Scilit:

Evaluating associations of haplotypes with traits
Genetic Epidemiology, 2004
The role of haplotypes in candidate gene studies
Genetic Epidemiology, 2004
Algorithms for inferring haplotypes
Genetic Epidemiology, 2004
Optimal Haplotype Block-Free Selection of Tagging SNPs for Genome-Wide Association Studies
Genome Research, 2004
Does haplotype diversity predict power for association mapping of disease susceptibility?
Human Genetics, 2004
Principal component analysis for selection of optimal SNP‐sets that capture intragenic genetic variation
Genetic Epidemiology, 2003
Entropy-based SNP selection for genetic association studies
Human Genetics, 2003
Genome scans and candidate gene approaches in the study of common diseases and variable drug responses
Trends in Genetics, 2003
Selection and Evaluation of Tagging SNPs in the Neuronal-Sodium-Channel Gene SCN1A: Implications for Linkage-Disequilibrium Gene Mapping
American Journal of Human Genetics, 2003
Association of the T-cell regulatory gene CTLA4 with susceptibility to autoimmune disease
Nature, 2003

Cited by 9 articles