A network-based machine-learning framework to identify both functional modules and disease genes

7 January 2021

journal article
research article
Published by Springer Science and Business Media LLC in Human Genetics

Vol. 140 (6), 897-913
https://doi.org/10.1007/s00439-020-02253-0

Abstract

Disease gene identification is a critical step towards uncovering the molecular mechanisms of diseases and systematically investigating complex disease phenotypes. Despite considerable efforts to develop powerful computing methods, candidate gene identification remains a severe challenge owing to the connectivity of an incomplete interactome network, which hampers the discovery of true novel candidate genes. We developed a network-based machine-learning framework to identify both functional modules and disease candidate genes. In this framework, we designed a semi-supervised non-negative matrix factorization model to obtain the functional modules related to the diseases and genes. Of note, we proposed a disease gene-prioritizing method called MapGene that integrates the correlations from both functional modules and network closeness. Our framework identified a set of functional modules with highly functional homogeneity and close gene interactions. Experiments on a large-scale benchmark dataset showed that MapGene performs significantly better than the state-of-the-art algorithms. Further analysis demonstrates MapGene can effectively relieve the impact of the incompleteness of interactome networks and obtain highly reliable rankings of candidate genes. In addition, disease cases on Parkinson’s disease and diabetes mellitus confirmed the generalization of MapGene for novel candidate gene identification. This work proposed, for the first time, an integrated computing framework to predict both functional modules and disease candidate genes. The methodology and results support that our framework has the potential to help discover underlying functional modules and reliable candidate genes in human disease.

Funding Information

National key research and development program (2017YFC1703506, 2017YFC1703502)
National Science and Technology Major Project (2019ZX09201005-002-006)
Fundamental Research Funds for the Central Universities (2018JBZ006)
Special Programs of Traditional Chinese Medicine (JDZX2015170, JDZX2015171)

This publication has 69 references indexed in Scilit:

ProDiGe: Prioritization Of Disease Genes with multitask machine learning from positive and unlabeled examples
BMC Bioinformatics, 2011
Network medicine: a network-based approach to human disease
Nature Reviews Genetics, 2010
Genome-wide inferring gene–phenotype relationship by walking on the heterogeneous network
Bioinformatics, 2010
MEME SUITE: tools for motif discovery and searching
Nucleic Acids Research, 2009
Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources
Nature Protocols, 2008
Walking the Interactome for Prioritization of Candidate Disease Genes
American Journal of Human Genetics, 2008
Variations in DNA elucidate molecular networks that cause disease
Nature, 2008
Network‐based global inference of human disease genes
Molecular Systems Biology, 2008
Discovering disease-genes by topological features in human protein–protein interaction network
Bioinformatics, 2006
Reconstruction of a Functional Human Gene Network, with an Application for Prioritizing Positional Candidate Genes
American Journal of Human Genetics, 2006

Cited by 8 articles