High-throughput prediction of protein antigenicity using protein microarray data

Open Access

7 October 2010

journal article
research article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 26 (23), 2936-2943
https://doi.org/10.1093/bioinformatics/btq551

Abstract

Motivation: Discovery of novel protective antigens is fundamental to the development of vaccines for existing and emerging pathogens. Most computational methods for predicting protein antigenicity rely directly on homology with previously characterized protective antigens; however, homology-based methods will fail to discover truly novel protective antigens. Thus, there is a significant need for homology-free methods capable of screening entire proteomes for the antigens most likely to generate a protective humoral immune response. Results: Here we begin by curating two types of positive data: (i) antigens that elicit a strong antibody response in protected individuals but not in unprotected individuals, using human immunoglobulin reactivity data obtained from protein microarray analyses; and (ii) known protective antigens from the literature. The resulting datasets are used to train a sequence-based prediction model, ANTIGENpro, to predict the likelihood that a protein is a protective antigen. ANTIGENpro correctly classifies 82% of the known protective antigens when trained using only the protein microarray datasets. The accuracy on the combined dataset is estimated at 76% by cross-validation experiments. Finally, ANTIGENpro performs well when evaluated on an external pathogen proteome for which protein microarray data were obtained after the initial development of ANTIGENpro. Availability: ANTIGENpro is integrated in the SCRATCH suite of predictors available at http://scratch.proteomics.ics.uci.edu. Contact:pfbaldi@ics.uci.edu

Keywords

This publication has 60 references indexed in Scilit:

A prospective analysis of the Ab response to Plasmodium falciparum before and after a malaria season by protein microarray
Proceedings of the National Academy of Sciences of the United States of America, 2010
AntigenDB: an immunoinformatics database of pathogen antigens
Nucleic Acids Research, 2009
A Burkholderia pseudomallei protein microarray reveals serodiagnostic and cross-reactive antigens
Proceedings of the National Academy of Sciences of the United States of America, 2009
COBEpro: a novel system for predicting continuous B-cell epitopes
"Protein Engineering, Design and Selection", 2008
Sickle Cell Trait Is Associated with a Delayed Onset of Malaria: Implications for Time‐to‐Event Analysis in Clinical Studies of Malaria
The Journal of Infectious Diseases, 2008
The Universal Protein Resource (UniProt)
Nucleic Acids Research, 2006
Predicting transmembrane protein topology with a hidden markov model: application to complete genomes
Journal of Molecular Biology, 2001
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Research, 1997
Basic Local Alignment Search Tool
Journal of Molecular Biology, 1990
Prediction of protein antigenic determinants from amino acid sequences.
Proceedings of the National Academy of Sciences of the United States of America, 1981

Cited by 345 articles