De novo peptide sequencing by deep learning
- 18 July 2017
- journal article
- research article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences of the United States of America
- Vol. 114 (31), 8247-8252
- https://doi.org/10.1073/pnas.1705691114
Abstract
De novo peptide sequencing from tandem MS data is the key technology in proteomics for the characterization of proteins, especially for new sequences, such as mAbs. In this study, we propose a deep neural network model, DeepNovo, for de novo peptide sequencing. DeepNovo architecture combines recent advances in convolutional neural networks and recurrent neural networks to learn features of tandem mass spectra, fragment ions, and sequence patterns of peptides. The networks are further integrated with local dynamic programming to solve the complex optimization task of de novo sequencing. We evaluated the method on a wide variety of species and found that DeepNovo considerably outperformed state of the art methods, achieving 7.7–22.9% higher accuracy at the amino acid level and 38.1–64.0% higher accuracy at the peptide level. We further used DeepNovo to automatically reconstruct the complete sequences of antibody light and heavy chains of mouse, achieving 97.5–100% coverage and 97.2–99.5% accuracy, without assisting databases. Moreover, DeepNovo is retrainable to adapt to any sources of data and provides a complete end-to-end training and prediction solution to the de novo sequencing problem. Not only does our study extend the deep learning revolution to a new field, but it also shows an innovative approach in solving optimization problems by using deep learning and dynamic programming.Keywords
Funding Information
- Canadian Network for Research and Innovation in Machining Technology, Natural Sciences and Engineering Research Council of Canada (OGP0046506)
- Canada Research Chairs (OGP0046506)
This publication has 56 references indexed in Scilit:
- UniNovo: a universal tool for de novo peptide sequencingBioinformatics, 2013
- Sequencing-Grade De novo Analysis of MS/MS Triplets (CID/HCD/ETD) From Overlapping PeptidesJournal of Proteome Research, 2013
- De Novo Sequencing and Homology SearchingMolecular & Cellular Proteomics, 2012
- Automated de novo protein sequencing of monoclonal antibodiesNature Biotechnology, 2008
- Isolation and Characterization of Carnocyclin A, a Novel Circular Bacteriocin Produced by Carnobacterium maltaromaticum UAL307Applied and Environmental Microbiology, 2008
- Proteome-wide characterization of sugarbeet seed vigor and its tissue specific expressionProceedings of the National Academy of Sciences of the United States of America, 2008
- MSNovo: A Dynamic Programming Algorithm for de Novo Peptide Sequencing via Tandem Mass SpectrometryAnalytical Chemistry, 2007
- De Novo Peptide Identification via Tandem Mass Spectrometry and Integer Linear OptimizationAnalytical Chemistry, 2007
- NovoHMM: A Hidden Markov Model for de Novo Peptide SequencingAnalytical Chemistry, 2005
- The primary structure of thioredoxin from Chromatium vinosum determined by high-performance tandem mass spectrometryBiochemistry, 1987