Semantic Disease Gene Embeddings (SmuDGE): phenotype-based disease gene prioritization without phenotypes

Open Access

1 September 2018

journal article
research article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 34 (17), i901-i907
https://doi.org/10.1093/bioinformatics/bty559

Abstract

In the past years, several methods have been developed to incorporate information about phenotypes into computational disease gene prioritization methods. These methods commonly compute the similarity between a disease’s (or patient’s) phenotypes and a database of gene-to-phenotype associations to find the phenotypically most similar match. A key limitation of these methods is their reliance on knowledge about phenotypes associated with particular genes which is highly incomplete in humans as well as in many model organisms such as the mouse. We developed SmuDGE, a method that uses feature learning to generate vector-based representations of phenotypes associated with an entity. SmuDGE can be used as a trainable semantic similarity measure to compare two sets of phenotypes (such as between a disease and gene, or a disease and patient). More importantly, SmuDGE can generate phenotype representations for entities that are only indirectly associated with phenotypes through an interaction network; for this purpose, SmuDGE exploits background knowledge in interaction networks comprised of multiple types of interactions. We demonstrate that SmuDGE can match or outperform semantic similarity in phenotype-based disease gene prioritization, and furthermore significantly extends the coverage of phenotype-based methods to all genes in a connected interaction network. https://github.com/bio-ontology-research-group/SmuDGE

Funding Information

King Abdullah University of Science and Technology
KAUST
Office of Sponsored Research
OSR (URF/1/3454-01-01, FCC/1/1976-08-01)

This publication has 32 references indexed in Scilit:

“Guilt by Association” Is the Exception Rather Than the Rule in Gene Networks
PLoS Computational Biology, 2012
MouseFinder: Candidate disease genes from mouse phenotype data
Human Mutation, 2012
PhenomeNET: a whole-phenome approach to disease gene discovery
Nucleic Acids Research, 2011
Prioritizing candidate disease genes by network-based boosting of genome-wide association data
Genome Research, 2011
The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored
Nucleic Acids Research, 2010
Clinical Diagnostics in Human Genetics with Semantic Similarity Searches in Ontologies
American Journal of Human Genetics, 2009
Metrics for GO based protein semantic similarity: a systematic evaluation
BMC Bioinformatics, 2008
Walking the Interactome for Prioritization of Candidate Disease Genes
American Journal of Human Genetics, 2008
FunSimMat: a comprehensive functional similarity database
Nucleic Acids Research, 2007
Gene prioritization through genomic data fusion
Nature Biotechnology, 2006

Cited by 47 articles