Hyperconserved CpG domains underlie Polycomb-binding sites

Abstract
Comparative genomics of CpG dinucleotides, which are targets of DNA methyltransferases in vertebrate genomes, has been constrained by their evolutionary instability and by the effect of methylation on their mutation rates. We compared the human and chimpanzee genomes to identify DNA sequence signatures correlated with rates of mutation at CpG dinucleotides. The new signatures were used to develop robust comparative genomics of CpG dinucleotides in heterogeneous regions and to identify genomic domains that have anomalous CpG divergence rates. The data showed that there are approximately 200 genomic regions where CpG distributions are far more conserved than predicted. These hyperconserved CpG domains largely coincide with domains bound by Polycomb repressive complex 2 in undifferentiated human embryonic stem cells and are almost exclusively present near genes whose products are involved in the regulation of embryonic development. Several domains were experimentally shown to be unmethylated at different developmental stages. These data indicate that particular evolutionary patterns and distinct sequence properties on scales much larger than standard transcription factor-binding sites may play an important role in Polycomb recruitment and transcriptional regulation of key developmental genes.