Predicting genome-wide DNA methylation using methylation marks, genomic position, and DNA regulatory elements
Open Access
- 24 January 2015
- journal article
- research article
- Published by Springer Science and Business Media LLC in Genome Biology
- Vol. 16 (1), 1-20
- https://doi.org/10.1186/s13059-015-0581-9
Abstract
Recent assays for individual-specific genome-wide DNA methylation profiles have enabled epigenome-wide association studies to identify specific CpG sites associated with a phenotype. Computational prediction of CpG site-specific methylation levels is critical to enable genome-wide analyses, but current approaches tackle average methylation within a locus and are often limited to specific genomic regions. We characterize genome-wide DNA methylation patterns, and show that correlation among CpG sites decays rapidly, making predictions solely based on neighboring sites challenging. We built a random forest classifier to predict methylation levels at CpG site resolution using features including neighboring CpG site methylation levels and genomic distance, co-localization with coding regions, CpG islands (CGIs), and regulatory elements from the ENCODE project. Our approach achieves 92% prediction accuracy of genome-wide methylation levels at single-CpG-site precision. The accuracy increases to 98% when restricted to CpG sites within CGIs and is robust across platform and cell-type heterogeneity. Our classifier outperforms other types of classifiers and identifies features that contribute to prediction accuracy: neighboring CpG site methylation, CGIs, co-localized DNase I hypersensitive sites, transcription factor binding sites, and histone modifications were found to be most predictive of methylation levels. Our observations of DNA methylation patterns led us to develop a classifier to predict DNA methylation levels at CpG site resolution with high accuracy. Furthermore, our method identified genomic features that interact with DNA methylation, suggesting mechanisms involved in DNA methylation modification and regulation, and linking diverse epigenetic processes.This publication has 100 references indexed in Scilit:
- High density DNA methylation array with single CpG site resolutionGenomics, 2011
- Directional DNA Methylation Changes and Complex Intermediate States Accompany Lineage Specificity in the Adult Hematopoietic CompartmentMolecular Cell, 2011
- A Genome-wide Comparison of the Functional Properties of Rare and Common Genetic Variants in HumansAmerican Journal of Human Genetics, 2011
- Distinct Epigenomic Landscapes of Pluripotent and Lineage-Committed Human CellsCell Stem Cell, 2010
- Genetic Heterogeneity in Human DiseaseCell, 2010
- Genetic Control of Individual Differences in Gene-Specific Methylation in Human BrainAmerican Journal of Human Genetics, 2010
- Human DNA methylomes at base resolution show widespread epigenomic differencesNature, 2009
- Histone methylation marks play important roles in predicting the methylation status of CpG islandsBiochemical and Biophysical Research Communications, 2008
- CpG island methylator phenotype underlies sporadic microsatellite instability and is tightly associated with BRAF mutation in colorectal cancerNature Genetics, 2006
- A haplotype map of the human genomeNature, 2005