Phenotype prediction based on genome-wide DNA methylation data

Open Access

17 June 2014

journal article
Published by Springer Science and Business Media LLC in BMC Bioinformatics

Vol. 15 (1), 193
https://doi.org/10.1186/1471-2105-15-193

Abstract

DNA methylation (DNAm) has important regulatory roles in many biological processes and diseases. It is the only epigenetic mark with a clear mechanism of mitotic inheritance and the only one easily available on a genome scale. Aberrant cytosine-phosphate-guanine (CpG) methylation has been discussed in the context of disease aetiology, especially cancer. CpG hypermethylation of promoter regions is often associated with silencing of tumour suppressor genes and hypomethylation with activation of oncogenes.Supervised principal component analysis (SPCA) is a popular machine learning method. However, in a recent application to phenotype prediction from DNAm data SPCA was inferior to the specific method EVORA. We present Model-Selection-SPCA (MS-SPCA), an enhanced version of SPCA. MS-SPCA applies several models that perform well in the training data to the test data and selects the very best models for final prediction based on parameters of the test data.We have applied MS-SPCA for phenotype prediction from genome-wide DNAm data. CpGs used for prediction are selected based on the quantification of three features of their methylation (average methylation difference, methylation variation difference and methylation-age-correlation). We analysed four independent case-control datasets that correspond to different stages of cervical cancer: (i) cases currently cytologically normal, but will later develop neoplastic transformations, (ii, iii) cases showing neoplastic transformations and (iv) cases with confirmed cancer. The first dataset was split into several smaller case-control datasets (samples either Human Papilloma Virus (HPV) positive or negative). We demonstrate that cytology normal HPV+ and HPV- samples contain DNAm patterns which are associated with later neoplastic transformations. We present evidence that DNAm patterns exist in cytology normal HPV- samples that (i) predispose to neoplastic transformations after HPV infection and (ii) predispose to HPV infection itself. MS-SPCA performs significantly better than EVORA. MS-SPCA can be applied to many classification problems. Additional improvements could include usage of more than one principal component (PC), with automatic selection of the optimal number of PCs. We expect that MS-SPCA will be useful for analysing recent larger DNAm data to predict future neoplastic transformations.

Keywords

This publication has 42 references indexed in Scilit:

A DNA methylation classifier of cervical precancer based on human papillomavirus and human genes
International Journal of Cancer, 2014
Minireview: Epigenetics of Obesity and Diabetes in Humans
Endocrinology, 2012
Analysis of High Accuracy, Quantitative Proteomics Data in the MaxQB Database
Molecular & Cellular Proteomics, 2012
CCDB: a curated database of genes involved in cervix cancer
Nucleic Acids Research, 2010
Common SNPs explain a large proportion of the heritability for human height
Nature Genetics, 2010
Age-dependent DNA methylation of genes that are suppressed in stem cells is a hallmark of cancer
Genome Research, 2010
Histone modifications silence the GATA transcription factor genes in ovarian cancer
Oncogene, 2006
DNA Methylation and Cancer
Journal of Clinical Oncology, 2004
Semi-Supervised Methods to Predict Patient Survival from Gene Expression Data
PLoS Biology, 2004
Diagnosis of multiple cancer types by shrunken centroids of gene expression
Proceedings of the National Academy of Sciences of the United States of America, 2002

Cited by 20 articles