Predicting RNA-Protein Interactions Using Only Sequence Information
Open Access
- 22 December 2011
- journal article
- Published by Springer Science and Business Media LLC in BMC Bioinformatics
- Vol. 12 (1), 489
- https://doi.org/10.1186/1471-2105-12-489
Abstract
RNA-protein interactions (RPIs) play important roles in a wide variety of cellular processes, ranging from transcriptional and post-transcriptional regulation of gene expression to host defense against pathogens. High throughput experiments to identify RNA-protein interactions are beginning to provide valuable information about the complexity of RNA-protein interaction networks, but are expensive and time consuming. Hence, there is a need for reliable computational methods for predicting RNA-protein interactions. We propose RPISeq, a family of classifiers for predicting RNA-protein interactions using only sequence information. Given the sequences of an RNA and a protein as input, RPIseq predicts whether or not the RNA-protein pair interact. The RNA sequence is encoded as a normalized vector of its ribonucleotide 4-mer composition, and the protein sequence is encoded as a normalized vector of its 3-mer composition, based on a 7-letter reduced alphabet representation. Two variants of RPISeq are presented: RPISeq-SVM, which uses a Support Vector Machine (SVM) classifier and RPISeq-RF, which uses a Random Forest classifier. On two non-redundant benchmark datasets extracted from the Protein-RNA Interface Database (PRIDB), RPISeq achieved an AUC (Area Under the Receiver Operating Characteristic (ROC) curve) of 0.96 and 0.92. On a third dataset containing only mRNA-protein interactions, the performance of RPISeq was competitive with that of a published method that requires information regarding many different features (e.g., mRNA half-life, GO annotations) of the putative RNA and protein partners. In addition, RPISeq classifiers trained using the PRIDB data correctly predicted the majority (57-99%) of non-coding RNA-protein interactions in NPInter-derived networks from E. coli, S. cerevisiae, D. melanogaster, M. musculus, and H. sapiens. Our experiments with RPISeq demonstrate that RNA-protein interactions can be reliably predicted using only sequence-derived information. RPISeq offers an inexpensive method for computational construction of RNA-protein interaction networks, and should provide useful insights into the function of non-coding RNAs. RPISeq is freely available as a web-based server at http://pridb.gdcb.iastate.edu/RPISeq/.Keywords
This publication has 50 references indexed in Scilit:
- A machine learning approach for the prediction of protein surface loop flexibilityProteins: Structure, Function, and Bioinformatics, 2011
- Sequence-based protein-protein interaction prediction via support vector machineJournal of Systems Science and Complexity, 2010
- Structure and function of nematode RNA-binding proteinsCurrent Opinion in Structural Biology, 2010
- Transcriptome-wide Identification of RNA-Binding Protein and MicroRNA Target Sites by PAR-CLIPCell, 2010
- RNA processing and its regulation: global insights into biological networksNature Reviews Genetics, 2010
- Rapid and systematic analysis of the RNA recognition specificities of RNA-binding proteinsNature Biotechnology, 2009
- HITS-CLIP yields genome-wide insights into brain alternative RNA processingNature, 2008
- Functional organisation of Escherichia coli transcriptional regulatory networkJournal of Molecular Biology, 2008
- Predicting protein–protein interactions based only on sequences informationProceedings of the National Academy of Sciences of the United States of America, 2007
- RIP-Chip: the isolation and identification of mRNAs, microRNAs and protein components of ribonucleoprotein complexes from cell extractsNature Protocols, 2006