Deep architectures for protein contact map prediction
Open Access
- 30 July 2012
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 28 (19), 2449-2457
- https://doi.org/10.1093/bioinformatics/bts475
Abstract
Motivation: Residue–residue contact prediction is important for protein structure prediction and other applications. However, the accuracy of current contact predictors often barely exceeds 20% on long-range contacts, falling short of the level required for ab initio structure prediction. Results: Here, we develop a novel machine learning approach for contact map prediction using three steps of increasing resolution. First, we use 2D recursive neural networks to predict coarse contacts and orientations between secondary structure elements. Second, we use an energy-based method to align secondary structure elements and predict contact probabilities between residues in contacting alpha-helices or strands. Third, we use a deep neural network architecture to organize and progressively refine the prediction of contacts, integrating information over both space and time. We train the architecture on a large set of non-redundant proteins and test it on a large set of non-homologous domains, as well as on the set of protein domains used for contact prediction in the two most recent CASP8 and CASP9 experiments. For long-range contacts, the accuracy of the new CMAPpro predictor is close to 30%, a significant increase over existing approaches. Availability: CMAPpro is available as part of the SCRATCH suite at http://scratch.proteomics.ics.uci.edu/. Contact:pfbaldi@uci.edu Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 30 references indexed in Scilit:
- Protein topology from predicted residue contactsProtein Science, 2011
- SIDEpro: A novel machine learning approach for the fast and accurate prediction of side‐chain conformationsProteins-Structure Function and Bioinformatics, 2011
- Evaluation of residue–residue contact predictions in CASP9Proteins-Structure Function and Bioinformatics, 2011
- MULTICOM: a multi-level combination approach to protein structure prediction and its assessments in CASP8Bioinformatics, 2010
- Disentangling Direct from Indirect Co-Evolution of Residues in Protein AlignmentsPLoS Computational Biology, 2010
- Using multi-data hidden Markov models trained on local neighborhoods of protein structure to predict residue–residue contactsBioinformatics, 2009
- Improved residue contact prediction using support vector machines and a large feature setBMC Bioinformatics, 2007
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- SCOP: A structural classification of proteins database for the investigation of sequences and structuresJournal of Molecular Biology, 1995
- Basic local alignment search toolJournal of Molecular Biology, 1990