Power analysis of transcriptome-wide association study: Implications for practical protocol choice

Open Access

26 February 2021

journal article
research article
Published by Public Library of Science (PLoS) in PLoS Genetics

Vol. 17 (2), e1009405
https://doi.org/10.1371/journal.pgen.1009405

Abstract

The transcriptome-wide association study (TWAS) has emerged as one of several promising techniques for integrating multi-scale ‘omics’ data into traditional genome-wide association studies (GWAS). Unlike GWAS, which associates phenotypic variance directly with genetic variants, TWAS uses a reference dataset to train a predictive model for gene expressions, which allows it to associate phenotype with variants through the mediating effect of expressions. Although effective, this core innovation of TWAS is poorly understood, since the predictive accuracy of the genotype-expression model is generally low and further bounded by expression heritability. This raises the question: to what degree does the accuracy of the expression model affect the power of TWAS? Furthermore, would replacing predictions with actual, experimentally determined expressions improve power? To answer these questions, we compared the power of GWAS, TWAS, and a hypothetical protocol utilizing real expression data. We derived non-centrality parameters (NCPs) for linear mixed models (LMMs) to enable closed-form calculations of statistical power that do not rely on specific protocol implementations. We examined two representative scenarios: causality (genotype contributes to phenotype through expression) and pleiotropy (genotype contributes directly to both phenotype and expression), and also tested the effects of various properties including expression heritability. Our analysis reveals two main outcomes: (1) Under pleiotropy, the use of predicted expressions in TWAS is superior to actual expressions. This explains why TWAS can function with weak expression models, and shows that TWAS remains relevant even when real expressions are available. (2) GWAS outperforms TWAS when expression heritability is below a threshold of 0.04 under causality, or 0.06 under pleiotropy. Analysis of existing publications suggests that TWAS has been misapplied in place of GWAS, in situations where expression heritability is low. We compared the effectiveness of three methods for finding genetic effects on disease in order to quantify their strengths and help researchers choose the best protocol for their data. The genome-wide association study (GWAS) is the standard method for identifying how the genetic differences between individuals relate to disease. Recently, the transcriptome-wide association study (TWAS) has improved GWAS by also estimating the effect of each genetic variant on the activity level (or expression) of genes related to disease. The effectiveness of TWAS is surprising because its estimates of gene expressions are very inaccurate, so we ask if a method using real expression data instead of estimates would perform better. Unlike past studies, which only use simulation to compare these methods, we incorporate novel statistical calculations to make our comparisons more accurate and universally applicable. We discover that depending on the type of relationship between genetics, gene expression, and disease, the estimates used by TWAS could be actually more relevant than real gene expressions. We also find that TWAS is not always better than GWAS when the relationship between genetics and expression is weak and identify specific turning points where past studies have incorrectly used TWAS instead of GWAS.

Keywords

Funding Information

NSERC Discovery Grant (RGPIN-2018-04328)
NSERC Discovery Grant (RGPIN-2017-04860)
Canada Foundation for Innovation JELF grant (36605)
New Frontiers in Research Fund (NFRFE-2018-00748)
Alberta Children’s Hospital Research Institut (Clinical Research Fund - 10027289)
Alberta Children’s Hospital Research Institut (Startup grant - 10013532)
Alberta Children’s Hospital Research Institute (Postdoctoral fellowship)

This publication has 52 references indexed in Scilit:

Genome-wide efficient mixed-model analysis for association studies
Nature Genetics, 2012
Rare-Variant Association Testing for Sequencing Data with the Sequence Kernel Association Test
American Journal of Human Genetics, 2011
RNA sequencing: advances, challenges and opportunities
Nature Reviews Genetics, 2010
Powerful SNP-Set Analysis for Case-Control Genome-wide Association Studies
American Journal of Human Genetics, 2010
Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation
Nature Biotechnology, 2010
Variance component model to account for sample structure in genome-wide association studies
Nature Genetics, 2010
A HUPO test sample study reveals common problems in mass spectrometry-based proteomics
Nature Methods, 2009
RNA-Seq: a revolutionary tool for transcriptomics
Nature Reviews Genetics, 2009
PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses
American Journal of Human Genetics, 2007
An integrative genomics approach to infer causal associations between gene expression and disease
Nature Genetics, 2005

Cited by 49 articles