Power analysis of transcriptome-wide association study: Implications for practical protocol choice
Open Access
- 26 February 2021
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Genetics
- Vol. 17 (2), e1009405
- https://doi.org/10.1371/journal.pgen.1009405
Abstract
The transcriptome-wide association study (TWAS) has emerged as one of several promising techniques for integrating multi-scale ‘omics’ data into traditional genome-wide association studies (GWAS). Unlike GWAS, which associates phenotypic variance directly with genetic variants, TWAS uses a reference dataset to train a predictive model for gene expressions, which allows it to associate phenotype with variants through the mediating effect of expressions. Although effective, this core innovation of TWAS is poorly understood, since the predictive accuracy of the genotype-expression model is generally low and further bounded by expression heritability. This raises the question: to what degree does the accuracy of the expression model affect the power of TWAS? Furthermore, would replacing predictions with actual, experimentally determined expressions improve power? To answer these questions, we compared the power of GWAS, TWAS, and a hypothetical protocol utilizing real expression data. We derived non-centrality parameters (NCPs) for linear mixed models (LMMs) to enable closed-form calculations of statistical power that do not rely on specific protocol implementations. We examined two representative scenarios: causality (genotype contributes to phenotype through expression) and pleiotropy (genotype contributes directly to both phenotype and expression), and also tested the effects of various properties including expression heritability. Our analysis reveals two main outcomes: (1) Under pleiotropy, the use of predicted expressions in TWAS is superior to actual expressions. This explains why TWAS can function with weak expression models, and shows that TWAS remains relevant even when real expressions are available. (2) GWAS outperforms TWAS when expression heritability is below a threshold of 0.04 under causality, or 0.06 under pleiotropy. Analysis of existing publications suggests that TWAS has been misapplied in place of GWAS, in situations where expression heritability is low. We compared the effectiveness of three methods for finding genetic effects on disease in order to quantify their strengths and help researchers choose the best protocol for their data. The genome-wide association study (GWAS) is the standard method for identifying how the genetic differences between individuals relate to disease. Recently, the transcriptome-wide association study (TWAS) has improved GWAS by also estimating the effect of each genetic variant on the activity level (or expression) of genes related to disease. The effectiveness of TWAS is surprising because its estimates of gene expressions are very inaccurate, so we ask if a method using real expression data instead of estimates would perform better. Unlike past studies, which only use simulation to compare these methods, we incorporate novel statistical calculations to make our comparisons more accurate and universally applicable. We discover that depending on the type of relationship between genetics, gene expression, and disease, the estimates used by TWAS could be actually more relevant than real gene expressions. We also find that TWAS is not always better than GWAS when the relationship between genetics and expression is weak and identify specific turning points where past studies have incorrectly used TWAS instead of GWAS.Keywords
Funding Information
- NSERC Discovery Grant (RGPIN-2018-04328)
- NSERC Discovery Grant (RGPIN-2017-04860)
- Canada Foundation for Innovation JELF grant (36605)
- New Frontiers in Research Fund (NFRFE-2018-00748)
- Alberta Children’s Hospital Research Institut (Clinical Research Fund - 10027289)
- Alberta Children’s Hospital Research Institut (Startup grant - 10013532)
- Alberta Children’s Hospital Research Institute (Postdoctoral fellowship)
This publication has 52 references indexed in Scilit:
- Genome-wide efficient mixed-model analysis for association studiesNature Genetics, 2012
- Rare-Variant Association Testing for Sequencing Data with the Sequence Kernel Association TestAmerican Journal of Human Genetics, 2011
- RNA sequencing: advances, challenges and opportunitiesNature Reviews Genetics, 2010
- Powerful SNP-Set Analysis for Case-Control Genome-wide Association StudiesAmerican Journal of Human Genetics, 2010
- Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiationNature Biotechnology, 2010
- Variance component model to account for sample structure in genome-wide association studiesNature Genetics, 2010
- A HUPO test sample study reveals common problems in mass spectrometry-based proteomicsNature Methods, 2009
- RNA-Seq: a revolutionary tool for transcriptomicsNature Reviews Genetics, 2009
- PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage AnalysesAmerican Journal of Human Genetics, 2007
- An integrative genomics approach to infer causal associations between gene expression and diseaseNature Genetics, 2005