Multi-population GWA mapping via multi-task regularized regression
Open Access
- 1 June 2010
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 26 (12), i208-i216
- https://doi.org/10.1093/bioinformatics/btq191
Abstract
Motivation: Population heterogeneity through admixing of different founder populations can produce spurious associations in genome- wide association studies that are linked to the population structure rather than the phenotype. Since samples from the same population generally co-evolve, different populations may or may not share the same genetic underpinnings for the seemingly common phenotype. Our goal is to develop a unified framework for detecting causal genetic markers through a joint association analysis of multiple populations. Results: Based on a multi-task regression principle, we present a multi-population group lasso algorithm using L1/L2-regularized regression for joint association analysis of multiple populations that are stratified either via population survey or computational estimation. Our algorithm combines information from genetic markers across populations, to identify causal markers. It also implicitly accounts for correlations between the genetic markers, thus enabling better control over false positive rates. Joint analysis across populations enables the detection of weak associations common to all populations with greater power than in a separate analysis of each population. At the same time, the regression-based framework allows causal alleles that are unique to a subset of the populations to be correctly identified. We demonstrate the effectiveness of our method on HapMap-simulated and lactase persistence datasets, where we significantly outperform state of the art methods, with greater power for detecting weak associations and reduced spurious associations. Availability: Software will be available at http://www.sailing.cs.cmu.edu/ Contact:epxing@cs.cmu.eduKeywords
This publication has 27 references indexed in Scilit:
- Inferring weak population structure with the assistance of sample group informationMolecular Ecology Resources, 2009
- Genome-wide association analysis by lasso penalized logistic regressionBioinformatics, 2009
- Accommodating Linkage Disequilibrium in Genetic-Association Analyses via Ridge RegressionAmerican Journal of Human Genetics, 2008
- A Randomization Test for Controlling Population Stratification in Whole-Genome Association StudiesAmerican Journal of Human Genetics, 2007
- Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controlsNature, 2007
- A Simple and Improved Correction for Population Stratification in Case-Control StudiesAmerican Journal of Human Genetics, 2007
- Convergent adaptation of human lactase persistence in Africa and EuropeNature Genetics, 2006
- Principal components analysis corrects for stratification in genome-wide association studiesNature Genetics, 2006
- A unified mixed-model method for association mapping that accounts for multiple levels of relatednessNature Genetics, 2005
- A haplotype map of the human genomeNature, 2005