High-throughput prediction of protein antigenicity using protein microarray data
Open Access
- 7 October 2010
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 26 (23), 2936-2943
- https://doi.org/10.1093/bioinformatics/btq551
Abstract
Motivation: Discovery of novel protective antigens is fundamental to the development of vaccines for existing and emerging pathogens. Most computational methods for predicting protein antigenicity rely directly on homology with previously characterized protective antigens; however, homology-based methods will fail to discover truly novel protective antigens. Thus, there is a significant need for homology-free methods capable of screening entire proteomes for the antigens most likely to generate a protective humoral immune response. Results: Here we begin by curating two types of positive data: (i) antigens that elicit a strong antibody response in protected individuals but not in unprotected individuals, using human immunoglobulin reactivity data obtained from protein microarray analyses; and (ii) known protective antigens from the literature. The resulting datasets are used to train a sequence-based prediction model, ANTIGENpro, to predict the likelihood that a protein is a protective antigen. ANTIGENpro correctly classifies 82% of the known protective antigens when trained using only the protein microarray datasets. The accuracy on the combined dataset is estimated at 76% by cross-validation experiments. Finally, ANTIGENpro performs well when evaluated on an external pathogen proteome for which protein microarray data were obtained after the initial development of ANTIGENpro. Availability: ANTIGENpro is integrated in the SCRATCH suite of predictors available at http://scratch.proteomics.ics.uci.edu. Contact:pfbaldi@ics.uci.eduKeywords
This publication has 60 references indexed in Scilit:
- A prospective analysis of the Ab response to Plasmodium falciparum before and after a malaria season by protein microarrayProceedings of the National Academy of Sciences of the United States of America, 2010
- AntigenDB: an immunoinformatics database of pathogen antigensNucleic Acids Research, 2009
- A Burkholderia pseudomallei protein microarray reveals serodiagnostic and cross-reactive antigensProceedings of the National Academy of Sciences of the United States of America, 2009
- COBEpro: a novel system for predicting continuous B-cell epitopes"Protein Engineering, Design and Selection", 2008
- Sickle Cell Trait Is Associated with a Delayed Onset of Malaria: Implications for Time‐to‐Event Analysis in Clinical Studies of MalariaThe Journal of Infectious Diseases, 2008
- The Universal Protein Resource (UniProt)Nucleic Acids Research, 2006
- Predicting transmembrane protein topology with a hidden markov model: application to complete genomesJournal of Molecular Biology, 2001
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Basic Local Alignment Search ToolJournal of Molecular Biology, 1990
- Prediction of protein antigenic determinants from amino acid sequences.Proceedings of the National Academy of Sciences of the United States of America, 1981