A Statistical Framework for Modeling HLA-Dependent T Cell Response Data

Abstract
The identification of T cell epitopes and their HLA (human leukocyte antigen) restrictions is important for applications such as the design of cellular vaccines for HIV. Traditional methods for such identification are costly and time-consuming. Recently, a more expeditious laboratory technique using ELISpot assays has been developed that allows for rapid screening of specific responses. However, this assay does not directly provide information concerning the HLA restriction of a response, a critical piece of information for vaccine design. Thus, we introduce, apply, and validate a statistical model for identifying HLA-restricted epitopes from ELISpot data. By looking at patterns across a broad range of donors, in conjunction with our statistical model, we can determine (probabilistically) which of the HLA alleles are likely to be responsible for the observed reactivities. Additionally, we can provide a good estimate of the number of false positives generated by our analysis (i.e., the false discovery rate). This model allows us to learn about new HLA-restricted epitopes from ELISpot data in an efficient, cost-effective, and high-throughput manner. We applied our approach to data from donors infected with HIV and identified many potential new HLA restrictions. Among 134 such predictions, six were confirmed in the lab and the remainder could not be ruled as invalid. These results shed light on the extent of HLA class I promiscuity, which has significant implications for the understanding of HLA class I antigen presentation and vaccine development. At the core of the human adaptive immune response is the train-to-kill mechanism in which specialized immune cells are sensitized to recognize small peptides from foreign pathogens (e.g., HIV virus). Following this sensitization, these cells are then activated to kill other cells that display this same peptide (and that are infected by this same pathogen). However, for sensitization and killing to occur, the pathogen peptide must be “paired up” with one of the infected person's other specialized immune molecules—an HLA (human leukocyte antigen) molecule. The way in which pathogen peptides interact with these HLA molecules defines if and how an immune response will be generated, which has implications for vaccine design where one may artificially introduce select peptides to pre-train the immune system. Furthermore, there is a huge repertoire of such HLA molecules, with almost no two people having the same set. We introduce a statistical approach for identifying which HLA molecules interact with which pathogen peptides, given a particular kind of laboratory data. Our approach takes as input, data that tells us only which pathogen peptides generate a response, but not which HLA molecules support the response. Our statistical approach fills in this missing information.