Evaluation of public cancer datasets and signatures identifies TP53 mutant signatures with robust prognostic and predictive value

Abstract
Systematic analysis of cancer gene-expression patterns using high-throughput transcriptional profiling technologies has led to the discovery and publication of hundreds of gene-expression signatures. However, few public signature values have been cross-validated over multiple studies for the prediction of cancer prognosis and chemosensitivity in the neoadjuvant setting. To analyze the prognostic and predictive values of publicly available signatures, we have implemented a systematic method for high-throughput and efficient validation of a large number of datasets and gene-expression signatures. Using this method, we performed a meta-analysis including 351 publicly available signatures, 37,000 random signatures, and 31 breast cancer datasets. Survival analyses and pathologic responses were used to assess prediction of prognosis, chemoresponsiveness, and chemo-drug sensitivity. Among 31 breast cancer datasets and 351 public signatures, we identified 22 validation datasets, two robust prognostic signatures (BRmet50 and PMID18271932Sig33) in breast cancer and one signature (PMID20813035Sig137) specific for prognosis prediction in patients with ER-negative tumors. The 22 validation datasets demonstrated enhanced ability to distinguish cancer gene profiles from random gene profiles. Both prognostic signatures are composed of genes associated with TP53 mutations and were able to stratify the good and poor prognostic groups successfully in 82%and 68% of the 22 validation datasets, respectively. We then assessed the abilities of the two signatures to predict treatment responses of breast cancer patients treated with commonly used chemotherapeutic regimens. Both BRmet50 and PMID18271932Sig33 retrospectively identified those patients with an insensitive response to neoadjuvant chemotherapy (mean positive predictive values 85%-88%). Among those patients predicted to be treatment sensitive, distant relapse-free survival (DRFS) was improved (negative predictive values 87%-88%). BRmet50 was further shown to prospectively predict taxane-anthracycline sensitivity in patients with HER2-negative (HER2-) breast cancer. We have developed and applied a high-throughput screening method for public cancer signature validation. Using this method, we identified appropriate datasets for cross-validation and two robust signatures that differentiate TP53 mutation status and have prognostic and predictive value for breast cancer patients.