Leveraging biological and statistical covariates improves the detection power in epigenome-wide association testing
Open Access
- 6 April 2020
- journal article
- research article
- Published by Springer Science and Business Media LLC in Genome Biology
- Vol. 21 (1), 1-19
- https://doi.org/10.1186/s13059-020-02001-7
Abstract
Background Epigenome-wide association studies (EWAS), which seek the association between epigenetic marks and an outcome or exposure, involve multiple hypothesis testing. False discovery rate (FDR) control has been widely used for multiple testing correction. However, traditional FDR control methods do not use auxiliary covariates, and they could be less powerful if the covariates could inform the likelihood of the null hypothesis. Recently, many covariate-adaptive FDR control methods have been developed, but application of these methods to EWAS data has not yet been explored. It is not clear whether these methods can significantly improve detection power, and if so, which covariates are more relevant for EWAS data. Results In this study, we evaluate the performance of five covariate-adaptive FDR control methods with EWAS-related covariates using simulated as well as real EWAS datasets. We develop an omnibus test to assess the informativeness of the covariates. We find that statistical covariates are generally more informative than biological covariates, and the covariates of methylation mean and variance are almost universally informative. In contrast, the informativeness of biological covariates depends on specific datasets. We show that the independent hypothesis weighting (IHW) and covariate adaptive multiple testing (CAMT) method are overall more powerful, especially for sparse signals, and could improve the detection power by a median of 25% and 68% on real datasets, compared to the ST procedure. We further validate the findings in various biological contexts. Conclusions Covariate-adaptive FDR control methods with informative covariates can significantly increase the detection power for EWAS. For sparse signals, IHW and CAMT are recommended.Funding Information
- National Key Research and Development Plan of China Grants (No. 2018YFA0107802)
- National Natural Science Foundation of China (NSFC) General Program (No. 81570122, 81770205)
- National key research and development program (No. 2016YFC0902800)
- Shanghai Municipal Education Commission-Gaofeng Clinical Medicine Grant Support (No. 20161303)
- Center for Individualized Medicine, Mayo Clinic
- the US National Science Foundation grants (DMS-1830392, DMS-1811747)
This publication has 80 references indexed in Scilit:
- Genome-wide signatures of differential DNA methylation in pediatric acute lymphoblastic leukemiaGenome Biology, 2013
- NCBI GEO: archive for functional genomics data sets—updateNucleic Acids Research, 2012
- An integrated encyclopedia of DNA elements in the human genomeNature, 2012
- A promoter DNA demethylation landscape of human hematopoietic differentiationNucleic Acids Research, 2011
- OCT4 establishes and maintains nucleosome-depleted regions that provide additional layers of epigenetic regulation of its target genesProceedings of the National Academy of Sciences of the United States of America, 2011
- Regulation of X-chromosome inactivation by the X-inactivation centreNature Reviews Genetics, 2011
- Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysisBMC Bioinformatics, 2010
- Genome-Wide Significance Levels and Weighted Hypothesis TestingStatistical Science, 2009
- Filtering for increased power for microarray data analysisBMC Bioinformatics, 2009
- A Direct Approach to False Discovery RatesJournal of the Royal Statistical Society Series B: Statistical Methodology, 2002