Reprioritizing Genetic Associations in Hit Regions Using LASSO‐Based Resample Model Averaging
Open Access
- 30 April 2012
- journal article
- research article
- Published by Wiley in Genetic Epidemiology
- Vol. 36 (5), 451-462
- https://doi.org/10.1002/gepi.21639
Abstract
Significance testing one SNP at a time has proven useful for identifying genomic regions that harbor variants affecting human disease. But after an initial genome scan has identified a “hit region” of association, single-locus approaches can falter. Local linkage disequilibrium (LD) can make both the number of underlying true signals and their identities ambiguous. Simultaneous modeling of multiple loci should help. However, it is typically applied ad hoc: conditioning on the top SNPs, with limited exploration of the model space and no assessment of how sensitive model choice was to sampling variability. Formal alternatives exist but are seldom used. Bayesian variable selection is coherent but requires specifying a full joint model, including priors on parameters and the model space. Penalized regression methods (e.g., LASSO) appear promising but require calibration, and, once calibrated, lead to a choice of SNPs that can be misleadingly decisive. We present a general method for characterizing uncertainty in model choice that is tailored to reprioritizing SNPs within a hit region under strong LD. Our method, LASSO local automatic regularization resample model averaging (LLARRMA), combines LASSO shrinkage with resample model averaging and multiple imputation, estimating for each SNP the probability that it would be included in a multi-SNP model in alternative realizations of the data. We apply LLARRMA to simulations based on case-control genome-wide association studies data, and find that when there are several causal loci and strong LD, LLARRMA identifies a set of candidates that is enriched for true signals relative to single locus analysis and to the recently proposed method of Stability Selection. Genet. Epidemiol. 36:451-462, 2012.Keywords
This publication has 50 references indexed in Scilit:
- A novel bayesian graphical model for genome‐wide multi‐SNP association mappingGenetic Epidemiology, 2011
- Multilocus association testing with penalized regressionGenetic Epidemiology, 2011
- Comparison of statistical tests for disease association with rare variantsGenetic Epidemiology, 2011
- A comparison of approaches to account for uncertainty in analysis of imputed genotypesGenetic Epidemiology, 2011
- Mining gold dust under the genome wide significance level: a two‐stage approach to analysis of GWASGenetic Epidemiology, 2010
- SNP Selection in genome‐wide and candidate gene studies via penalized logistic regressionGenetic Epidemiology, 2010
- MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypesGenetic Epidemiology, 2010
- Prioritizing GWAS Results: A Review of Statistical Methods and Recommendations for Their ApplicationAmerican Journal of Human Genetics, 2010
- Finding the missing heritability of complex diseasesNature, 2009
- Association of the T-cell regulatory gene CTLA4 with susceptibility to autoimmune diseaseNature, 2003