Genetic Association Mapping via Evolution-Based Clustering of Haplotypes

Open Access

6 July 2007

journal article
research article
Published by Public Library of Science (PLoS) in PLoS Genetics

Vol. 3 (7), e111
https://doi.org/10.1371/journal.pgen.0030111

Abstract

Multilocus analysis of single nucleotide polymorphism haplotypes is a promising approach to dissecting the genetic basis of complex diseases. We propose a coalescent-based model for association mapping that potentially increases the power to detect disease-susceptibility variants in genetic association studies. The approach uses Bayesian partition modelling to cluster haplotypes with similar disease risks by exploiting evolutionary information. We focus on candidate gene regions with densely spaced markers and model chromosomal segments in high linkage disequilibrium therein assuming a perfect phylogeny. To make this assumption more realistic, we split the chromosomal region of interest into sub-regions or windows of high linkage disequilibrium. The haplotype space is then partitioned into disjoint clusters, within which the phenotype–haplotype association is assumed to be the same. For example, in case-control studies, we expect chromosomal segments bearing the causal variant on a common ancestral background to be more frequent among cases than controls, giving rise to two separate haplotype clusters. The novelty of our approach arises from the fact that the distance used for clustering haplotypes has an evolutionary interpretation, as haplotypes are clustered according to the time to their most recent common ancestor. Our approach is fully Bayesian and we develop a Markov Chain Monte Carlo algorithm to sample efficiently over the space of possible partitions. We compare the proposed approach to both single-marker analyses and recently proposed multi-marker methods and show that the Bayesian partition modelling performs similarly in localizing the causal allele while yielding lower false-positive rates. Also, the method is computationally quicker than other multi-marker approaches. We present an application to real genotype data from the CYP2D6 gene region, which has a confirmed role in drug metabolism, where we succeed in mapping the location of the susceptibility variant within a small error. Genetic association studies offer great promise in dissecting the genetic contribution to complex diseases. The underlying idea of such studies is to search for genetic variants along the genome that appear to be associated with a trait of interest, e.g., disease status for a binary trait. One then proceeds by genotyping unrelated individuals at several marker sites, searching for positions where single markers or combinations of multiple markers on the paternally and maternally inherited chromosomes (or haplotypes) appear to discriminate among affected and unaffected individuals, flagging genomic regions that may harbour disease susceptibility variants. The statistical analysis of such studies, however, poses several challenges, such as multiplicity and false-positives issue, due to the large number of markers considered. Focusing on case-control studies, we present a novel evolution-based Bayesian partition model that clusters haplotypes with similar disease risks. The novelty of this approach lies in the use of perfect phylogenies, which offers a sensible and computationally efficient approximation of the ancestry of a sample of chromosomes. We show that the incorporation of phylogenetic information leads to low false-positive rates, while our model fitting offers computational advantages over similar recently proposed coalescent-based haplotype clustering methods.

Keywords

This publication has 32 references indexed in Scilit:

Mapping Trait Loci by Use of Inferred Ancestral Recombination Graphs
American Journal of Human Genetics, 2006
A Flexible Bayesian Framework for Modeling Haplotype Association with Disease, Allowing for Dominance Effects of the Underlying Causative Variants
American Journal of Human Genetics, 2006
Bayesian Graphical Models for Genomewide Association Studies
American Journal of Human Genetics, 2006
A Fine-Scale Map of Recombination Rates and Hotspots Across the Human Genome
Science, 2005
Application of Bayesian spatial statistical methods to analysis of haplotypes effects and gene mapping
Genetic Epidemiology, 2003
Fine-Scale Mapping of Disease Loci via Shattered Coalescent Modeling of Genealogies
American Journal of Human Genetics, 2002
Bayesian Analysis of Haplotypes for Linkage Disequilibrium Mapping
Genome Research, 2001
High-Resolution Multipoint Linkage-Disequilibrium Mapping in the Context of a Human Genome Sequence
American Journal of Human Genetics, 2001
Bayes Factors
Journal of the American Statistical Association, 1995
Efficient algorithms for inferring evolutionary trees
Networks, 1991

Cited by 30 articles