Integrating comprehensive functional annotations to boost power and accuracy in gene-based association analysis

Abstract
Gene-based association tests aggregate genotypes across multiple variants for each gene, providing an interpretable gene-level analysis framework for genome-wide association studies (GWAS). Early gene-based test applications often focused on rare coding variants; a more recent wave of gene-based methods, e.g. TWAS, use eQTLs to interrogate regulatory associations. Regulatory variants are expected to be particularly valuable for gene-based analysis, since most GWAS associations to date are non-coding. However, identifying causal genes from regulatory associations remains challenging and contentious. Here, we present a statistical framework and computational tool to integrate heterogeneous annotations with GWAS summary statistics for gene-based analysis, applied with comprehensive coding and tissue-specific regulatory annotations. We compare power and accuracy identifying causal genes across single-annotation, omnibus, and annotation-agnostic gene-based tests in simulation studies and an analysis of 128 traits from the UK Biobank, and find that incorporating heterogeneous annotations in gene-based association analysis increases power and performance identifying causal genes. Gene-based association tests are statistical methods used in genome-wide association studies (GWAS) to identify genes that affect heritable traits. Gene-based tests are formed by aggregating genotypes across multiple genetic variants for each gene, often including only variants that are likely to affect gene function or regulation. In this work, we present a unified framework to integrate heterogeneous classes of functional variants in gene-based association analysis. This approach enables us to simultaneously assess multiple distinct biological mechanisms underlying GWAS association signals, and to construct powerful omnibus tests by aggregating across functional classes for each gene. We evaluated the performance of gene-based association test methods and strategies to identify causal genes by conducting extensive simulation studies, and by analyzing 128 human traits from the UK Biobank and comparing our results against lists of high-confidence putative causal genes. Our analysis suggests that incorporating heterogeneous functional variants in gene-based association tests increases power to detect gene-based association and helps identify causal genes.