PathScan: a tool for discerning mutational significance in groups of putative cancer genes
Open Access
- 14 April 2011
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 27 (12), 1595-1602
- https://doi.org/10.1093/bioinformatics/btr193
Abstract
Motivation: The expansion of cancer genome sequencing continues to stimulate development of analytical tools for inferring relationships between somatic changes and tumor development. Pathway associations are especially consequential, but existing algorithms are demonstrably inadequate. Methods: Here, we propose the PathScan significance test for the scenario where pathway mutations collectively contribute to tumor development. Its design addresses two aspects that established methods neglect. First, we account for variations in gene length and the consequent differences in their mutation probabilities under the standard null hypothesis of random mutation. The associated spike in computational effort is mitigated by accurate convolution-based approximation. Second, we combine individual probabilities into a multiple-sample value using Fisher–Lancaster theory, thereby improving differentiation between a few highly mutated genes and many genes having only a few mutations apiece. We investigate accuracy, computational effort and power, reporting acceptable performance for each. Results: As an example calculation, we re-analyze KEGG-based lung adenocarcinoma pathway mutations from the Tumor Sequencing Project. Our test recapitulates the most significant pathways and finds that others for which the original test battery was inconclusive are not actually significant. It also identifies the focal adhesion pathway as being significantly mutated, a finding consistent with earlier studies. We also expand this analysis to other databases: Reactome, BioCarta, Pfam, PID and SMART, finding additional hits in ErbB and EPHA signaling pathways and regulation of telomerase. All have implications and plausible mechanistic roles in cancer. Finally, we discuss aspects of extending the method to integrate gene-specific background rates and other types of genetic anomalies. Availability: PathScan is implemented in Perl and is available from the Genome Institute at: http://genome.wustl.edu/software/pathscan. Contact:mwendl@wustl.edu Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 47 references indexed in Scilit:
- The genomic complexity of primary human prostate cancerNature, 2011
- Genome remodelling in a basal-like breast cancer metastasis and xenograftNature, 2010
- Eph receptors and ephrins in cancer: bidirectional signalling and beyondNature Reviews Cancer, 2010
- Recurring Mutations Found by Sequencing an Acute Myeloid Leukemia GenomeNew England Journal of Medicine, 2009
- DNA sequencing of a cytogenetically normal acute myeloid leukaemia genomeNature, 2008
- Somatic mutations affect key pathways in lung adenocarcinomaNature, 2008
- Patterns of somatic mutation in human cancer genomesNature, 2007
- A screen of the complete protein kinase gene family identifies diverse patterns of somatic mutations in human breast cancerNature Genetics, 2005
- Finishing the euchromatic sequence of the human genomeNature, 2004
- Cancer genes and the pathways they controlNature Medicine, 2004