Integration of somatic mutation, expression and functional data reveals potential driver genes predictive of breast cancer survival
Open Access
- 24 March 2015
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 31 (16), 2607-2613
- https://doi.org/10.1093/bioinformatics/btv164
Abstract
Motivation: Genome and transcriptome analyses can be used to explore cancers comprehensively, and it is increasingly common to have multiple omics data measured from each individual. Furthermore, there are rich functional data such as predicted impact of mutations on protein coding and gene/protein networks. However, integration of the complex information across the different omics and functional data is still challenging. Clinical validation, particularly based on patient outcomes such as survival, is important for assessing the relevance of the integrated information and for comparing different procedures. Results: An analysis pipeline is built for integrating genomic and transcriptomic alterations from whole-exome and RNA sequence data and functional data from protein function prediction and gene interaction networks. The method accumulates evidence for the functional implications of mutated potential driver genes found within and across patients. A driver-gene score (DGscore) is developed to capture the cumulative effect of such genes. To contribute to the score, a gene has to be frequently mutated, with high or moderate mutational impact at protein level, exhibiting an extreme expression and functionally linked to many differentially expressed neighbors in the functional gene network. The pipeline is applied to 60 matched tumor and normal samples of the same patient from The Cancer Genome Atlas breast-cancer project. In clinical validation, patients with high DGscores have worse survival than those with low scores (P = 0.001). Furthermore, the DGscore outperforms the established expression-based signatures MammaPrint and PAM50 in predicting patient survival. In conclusion, integration of mutation, expression and functional data allows identification of clinically relevant potential driver genes in cancer. Availability and implementation: The documented pipeline including annotated sample scripts can be found in http://fafner.meb.ki.se/biostatwiki/driver-genes/. Contact:yudi.pawitan@ki.se Supplementary information: Supplementary data are available at Bioinformatics online.This publication has 28 references indexed in Scilit:
- Type 10 Soluble Adenylyl Cyclase Is Overexpressed in Prostate Carcinoma and Controls Proliferation of Prostate Cancer CellsOnline Journal of Public Health Informatics, 2013
- Comprehensive molecular portraits of human breast tumoursNature, 2012
- Network enrichment analysis: extension of gene-set enrichment analysis to gene networksBMC Bioinformatics, 2012
- A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEffFly, 2012
- Evolution of platinum resistance in high-grade serous ovarian cancerThe Lancet Oncology, 2011
- Mutual exclusivity analysis identifies oncogenic network modulesGenome Research, 2011
- Origin, functional role, and clinical impact of Fanconi anemia FANCA mutationsBlood, 2011
- An Integrated Approach to Uncover Drivers of CancerCell, 2010
- Global networks of functional coupling in eukaryotes from comprehensive data integrationGenome Research, 2009
- A census of human cancer genesNature Reviews Cancer, 2004