De novo prediction of RNA–protein interactions from sequence information
- 1 January 2013
- journal article
- research article
- Published by Royal Society of Chemistry (RSC) in Molecular BioSystems
- Vol. 9 (1), 133-142
- https://doi.org/10.1039/c2mb25292a
Abstract
Protein–RNA interactions are fundamentally important in understanding cellular processes. In particular, non-coding RNA–protein interactions play an important role to facilitate biological functions in signalling, transcriptional regulation, and even the progression of complex diseases. However, experimental determination of protein–RNA interactions remains time-consuming and labour-intensive. Here, we develop a novel extended naïve-Bayes-classifier for de novo prediction of protein–RNA interactions, only using protein and RNA sequence information. Specifically, we first collect a set of known protein–RNA interactions as gold-standard positives and extract sequence-based features to represent each protein–RNA pair. To fill the gap between high dimensional features and scarcity of gold-standard positives, we select effective features by cutting a likelihood ratio score, which not only reduces the computational complexity but also allows transparent feature integration during prediction. An extended naïve Bayes classifier is then constructed using these effective features to train a protein–RNA interaction prediction model. Numerical experiments show that our method can achieve the prediction accuracy of 0.77 even though only a small number of protein–RNA interaction data are available. In particular, we demonstrate that the extended naïve-Bayes-classifier is superior to the naïve-Bayes-classifier by fully considering the dependences among features. Importantly, we conduct ncRNA pull-down experiments to validate the predicted novel protein–RNA interactions and identify the interacting proteins of sbRNA CeN72 in C. elegans, which further demonstrates the effectiveness of our method.Keywords
This publication has 45 references indexed in Scilit:
- In silico characterization and prediction of global protein–mRNA interactions in yeastNucleic Acids Research, 2011
- PiRaNhA: a server for the computational prediction of RNA-binding residues in protein sequencesNucleic Acids Research, 2010
- Predicting eukaryotic transcriptional cooperativity by Bayesian network integration of genome-wide dataNucleic Acids Research, 2009
- Long non-coding RNAs: insights into functionsNature Reviews Genetics, 2009
- Functional Demarcation of Active and Silent Chromatin Domains in Human HOX Loci by Noncoding RNAsCell, 2007
- RNABindR: a server for analyzing and predicting RNA-binding sites in proteinsNucleic Acids Research, 2007
- Predicting protein–protein interactions based only on sequences informationProceedings of the National Academy of Sciences of the United States of America, 2007
- Sequence-specific binding of single-stranded RNA: is there a code for recognition?Nucleic Acids Research, 2006
- BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequencesNucleic Acids Research, 2006
- The Protein Data BankNucleic Acids Research, 2000