Integrating Biological Knowledge with Gene Expression Profiles for Survival Prediction of Cancer

1 February 2009

journal article
research article
Published by Mary Ann Liebert Inc in Journal of Computational Biology

Vol. 16 (2), 265-278
https://doi.org/10.1089/cmb.2008.12tt

Abstract

Due to the large variability in survival times between cancer patients and the plethora of genes on microarrays unrelated to outcome, building accurate prediction models that are easy to interpret remains a challenge. In this paper, we propose a general strategy for improving performance and interpretability of prediction models by integrating gene expression data with prior biological knowledge. First, we link gene identifiers in expression dataset with gene annotation databases such as Gene Ontology (GO). Then we construct “supergenes” for each gene category by summarizing information from genes related to outcome using a modified principal component analysis (PCA) method. Finally, instead of using genes as predictors, we use these supergenes representing information from each gene category as predictors to predict survival outcome. In addition to identifying gene categories associated with outcome, the proposed approach also carries out additional within-category selection to select important genes within each gene set. We show, using two real breast cancer microarray datasets, that the prediction models constructed based on gene sets (or pathway) information outperform the prediction models based on expression values of single genes, with improved prediction accuracy and interpretability.

Keywords

This publication has 44 references indexed in Scilit:

Supervised principal component analysis for gene set enrichment of microarray data with continuous or survival outcomes
Bioinformatics, 2008
The Humoral Immune System Has a Key Prognostic Impact in Node-Negative Breast Cancer
Cancer Research, 2008
Network‐based classification of breast cancer metastasis
Molecular Systems Biology, 2007
Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles
Proceedings of the National Academy of Sciences of the United States of America, 2005
An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival
Proceedings of the National Academy of Sciences of the United States of America, 2005
PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes
Nature Genetics, 2003
A molecular signature of metastasis in primary solid tumors
Nature Genetics, 2002
Gene-expression profiles predict survival of patients with lung adenocarcinoma
Nature Medicine, 2002
Gene expression profiling predicts clinical outcome of breast cancer
Nature, 2002
Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling
Nature, 2000

Cited by 45 articles