Multitask learning for host–pathogen protein interactions
Open Access
- 19 June 2013
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 29 (13), i217-i226
- https://doi.org/10.1093/bioinformatics/btt245
Abstract
Motivation: An important aspect of infectious disease research involves understanding the differences and commonalities in the infection mechanisms underlying various diseases. Systems biology-based approaches study infectious diseases by analyzing the interactions between the host species and the pathogen organisms. This work aims to combine the knowledge from experimental studies of host–pathogen interactions in several diseases to build stronger predictive models. Our approach is based on a formalism from machine learning called ‘multitask learning’, which considers the problem of building models across tasks that are related to each other. A ‘task’ in our scenario is the set of host–pathogen protein interactions involved in one disease. To integrate interactions from several tasks (i.e. diseases), our method exploits the similarity in the infection process across the diseases. In particular, we use the biological hypothesis that similar pathogens target the same critical biological processes in the host, in defining a common structure across the tasks. Results: Our current work on host–pathogen protein interaction prediction focuses on human as the host, and four bacterial species as pathogens. The multitask learning technique we develop uses a task-based regularization approach. We find that the resulting optimization problem is a difference of convex (DC) functions. To optimize, we implement a Convex–Concave procedure-based algorithm. We compare our integrative approach to baseline methods that build models on a single host–pathogen protein interaction dataset. Our results show that our approach outperforms the baselines on the training data. We further analyze the protein interaction predictions generated by the models, and find some interesting insights. Availability: The predictions and code are available at: http://www.cs.cmu.edu/∼mkshirsa/ismb2013_paper320.html Contact: j.klein-seetharaman@warwick.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 34 references indexed in Scilit:
- Techniques to cope with missing data in host–pathogen protein interaction predictionBioinformatics, 2012
- The current Salmonella‐host interactomeProteomics – Clinical Applications, 2011
- Ongoing and future developments at the Universal Protein ResourceNucleic Acids Research, 2010
- Semi-supervised multi-task learning for predicting interactions between HIV-1 and human proteinsBioinformatics, 2010
- Systematic prediction of human membrane receptor interactionsProteomics, 2009
- PIG--the pathogen interaction gatewayNucleic Acids Research, 2009
- Reactome knowledgebase of human biological pathways and processesNucleic Acids Research, 2008
- PID: the Pathway Interaction DatabaseNucleic Acids Research, 2008
- PHI-base update: additions to the pathogen host interaction databaseNucleic Acids Research, 2007
- Prediction of yeast protein-protein interaction network: insights from the Gene Ontology and annotationsNucleic Acids Research, 2006