CpG Island Methylation in Human Lymphocytes Is Highly Correlated with DNA Sequence, Repeats, and Predicted DNA Structure

Abstract
CpG island methylation plays an important role in epigenetic gene control during mammalian development and is frequently altered in disease situations such as cancer. The majority of CpG islands is normally unmethylated, but a sizeable fraction is prone to become methylated in various cell types and pathological situations. The goal of this study is to show that a computational epigenetics approach can discriminate between CpG islands that are prone to methylation from those that remain unmethylated. We develop a bioinformatics scoring and prediction method on the basis of a set of 1,184 DNA attributes, which refer to sequence, repeats, predicted structure, CpG islands, genes, predicted binding sites, conservation, and single nucleotide polymorphisms. These attributes are scored on 132 CpG islands across the entire human Chromosome 21, whose methylation status was previously established for normal human lymphocytes. Our results show that three groups of DNA attributes, namely certain sequence patterns, specific DNA repeats, and a particular DNA structure, are each highly correlated with CpG island methylation (correlation coefficients of 0.64, 0.66, and 0.49, respectively). We predicted, and subsequently experimentally examined 12 CpG islands from human Chromosome 21 with unknown methylation patterns and found more than 90% of our predictions to be correct. In addition, we applied our prediction method to analyzing Human Epigenome Project methylation data on human Chromosome 6 and again observed high prediction accuracy. In summary, our results suggest that DNA composition of CpG islands (sequence, repeats, and structure) plays a significant role in predisposing CpG islands for DNA methylation. This finding may have a strong impact on our understanding of changes in CpG island methylation in development and disease. DNA methylation is the only epigenetic mechanism in eukaryotes that is known to directly modify the DNA. It plays an important role for gene control during development and cell differentiation, and it is a promising therapeutic target in cancer research. While a genome-wide picture of DNA methylation patterns is currently emerging, we have only fragmentary knowledge about the linkage between DNA methylation and other genomic attributes such as DNA sequence and structure, repetitive elements, or sequence conservation. The authors fill this gap by reporting on a comprehensive bioinformatical analysis of DNA methylation on human Chromosome 21—and in part, extending to other regions of the human genome. They report new associations that will help elucidate the functions of DNA methylation along the human genome. Furthermore, the authors show that their findings can be applied to predicting DNA methylation patterns from genome sequence. Such predictions have the potential of speeding up genome-wide epigenetic profiling: It may be possible to focus experimental resources on a few selected areas while bioinformatics procedures are applied to the bulk of the genome.