The Bayesian lasso for genome-wide association studies

Open Access

14 December 2010

journal article
research article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 27 (4), 516-523
https://doi.org/10.1093/bioinformatics/btq688

Abstract

Motivation: Despite their success in identifying genes that affect complex disease or traits, current genome-wide association studies (GWASs) based on a single SNP analysis are too simple to elucidate a comprehensive picture of the genetic architecture of phenotypes. A simultaneous analysis of a large number of SNPs, although statistically challenging, especially with a small number of samples, is crucial for genetic modeling. Method: We propose a two-stage procedure for multi-SNP modeling and analysis in GWASs, by first producing a ‘preconditioned’ response variable using a supervised principle component analysis and then formulating Bayesian lasso to select a subset of significant SNPs. The Bayesian lasso is implemented with a hierarchical model, in which scale mixtures of normal are used as prior distributions for the genetic effects and exponential priors are considered for their variances, and then solved by using the Markov chain Monte Carlo (MCMC) algorithm. Our approach obviates the choice of the lasso parameter by imposing a diffuse hyperprior on it and estimating it along with other parameters and is particularly powerful for selecting the most relevant SNPs for GWASs, where the number of predictors exceeds the number of observations. Results: The new approach was examined through a simulation study. By using the approach to analyze a real dataset from the Framingham Heart Study, we detected several significant genes that are associated with body mass index (BMI). Our findings support the previous results about BMI-related SNPs and, meanwhile, gain new insights into the genetic control of this trait. Availability: The computer code for the approach developed is available at Penn State Center for Statistical Genetics web site, http://statgen.psu.edu. Contact:rwu@hes.hmc.psu.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Keywords

This publication has 21 references indexed in Scilit:

Common SNPs explain a large proportion of the heritability for human height
Nature Genetics, 2010
A variational Bayes algorithm for fast and accurate multiple locus genome-wide association analysis
BMC Bioinformatics, 2010
Genome-wide association analysis by lasso penalized logistic regression
Bioinformatics, 2009
Sure Independence Screening for Ultrahigh Dimensional Feature Space
Journal of the Royal Statistical Society Series B: Statistical Methodology, 2008
Simultaneous Analysis of All SNPs in Genome-Wide and Re-Sequencing Association Studies
PLoS Genetics, 2008
Bayesian LASSO for Quantitative Trait Loci Mapping
Genetics, 2008
The Framingham Heart Study, on its way to becoming the gold standard for Cardiovascular Genetic Epidemiology?
BMC Medical Genetics, 2007
Bayesian mapping of genotype × expression interactions in quantitative and qualitative traits
Heredity, 2006
Regularization and Variable Selection Via the Elastic Net
Journal of the Royal Statistical Society Series B: Statistical Methodology, 2005
A Statistical View of Some Chemometrics Regression Tools
Technometrics, 1993

Cited by 170 articles