Bayesian Test for Colocalisation between Pairs of Genetic Association Studies Using Summary Statistics
Top Cited Papers
Open Access
- 15 May 2014
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Genetics
- Vol. 10 (5), e1004383
- https://doi.org/10.1371/journal.pgen.1004383
Abstract
Genetic association studies, in particular the genome-wide association study (GWAS) design, have provided a wealth of novel insights into the aetiology of a wide range of human diseases and traits, in particular cardiovascular diseases and lipid biomarkers. The next challenge consists of understanding the molecular basis of these associations. The integration of multiple association datasets, including gene expression datasets, can contribute to this goal. We have developed a novel statistical methodology to assess whether two association signals are consistent with a shared causal variant. An application is the integration of disease scans with expression quantitative trait locus (eQTL) studies, but any pair of GWAS datasets can be integrated in this framework. We demonstrate the value of the approach by re-analysing a gene expression dataset in 966 liver samples with a published meta-analysis of lipid traits including >100,000 individuals of European ancestry. Combining all lipid biomarkers, our re-analysis supported 26 out of 38 reported colocalisation results with eQTLs and identified 14 new colocalisation results, hence highlighting the value of a formal statistical test. In three cases of reported eQTL-lipid pairs (SYPL2, IFT172, TBKBP1) for which our analysis suggests that the eQTL pattern is not consistent with the lipid association, we identify alternative colocalisation results with SORT1, GCKR, and KPNB1, indicating that these genes are more likely to be causal in these genomic intervals. A key feature of the method is the ability to derive the output statistics from single SNP summary statistics, hence making it possible to perform systematic meta-analysis type comparisons across multiple GWAS datasets (implemented online at http://coloc.cs.ucl.ac.uk/coloc/). Our methodology provides information about candidate causal genes in associated intervals and has direct implications for the understanding of complex diseases as well as the design of drugs to target disease pathways. Genome-wide association studies (GWAS) have found a large number of genetic regions (“loci”) affecting clinical end-points and phenotypes, many outside coding intervals. One approach to understanding the biological basis of these associations has been to explore whether GWAS signals from intermediate cellular phenotypes, in particular gene expression, are located in the same loci (“colocalise”) and are potentially mediating the disease signals. However, it is not clear how to assess whether the same variants are responsible for the two GWAS signals or whether it is distinct causal variants close to each other. In this paper, we describe a statistical method that can use simply single variant summary statistics to test for colocalisation of GWAS signals. We describe one application of our method to a meta-analysis of blood lipids and liver expression, although any two datasets resulting from association studies can be used. Our method is able to detect the subset of GWAS signals explained by regulatory effects and identify candidate genes affected by the same GWAS variants. As summary GWAS data are increasingly available, applications of colocalisation methods to integrate the findings will be essential for functional follow-up, and will also be particularly useful to identify tissue specific signals in eQTL datasets.Keywords
Other Versions
This publication has 52 references indexed in Scilit:
- Sherlock: Detecting Gene-Disease Associations by Matching Patterns of Expression QTL and GWASAmerican Journal of Human Genetics, 2013
- Architecture of the human regulatory network derived from ENCODE dataNature, 2012
- Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traitsNature Genetics, 2012
- Dense genotyping identifies and localizes multiple common and rare variant association signals in celiac diseaseNature Genetics, 2011
- Gene Expression in Skin and Lymphoblastoid Cells: Refined Statistical Method Reveals Extensive Overlap in cis-eQTL SignalsAmerican Journal of Human Genetics, 2010
- Biological, clinical and population relevance of 95 loci for blood lipidsNature, 2010
- Understanding mechanisms underlying human gene expression variation with RNA sequencingNature, 2010
- Multiple common variants for celiac disease influencing immune gene expressionNature Genetics, 2010
- Newly identified genetic risk variants for celiac disease related to the immune responseNature Genetics, 2008
- Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controlsNature, 2007