Simultaneous Identification of Multiple Driver Pathways in Cancer

Abstract
Distinguishing the somatic mutations responsible for cancer (driver mutations) from random, passenger mutations is a key challenge in cancer genomics. Driver mutations generally target cellular signaling and regulatory pathways consisting of multiple genes. This heterogeneity complicates the identification of driver mutations by their recurrence across samples, as different combinations of mutations in driver pathways are observed in different samples. We introduce the Multi-Dendrix algorithm for the simultaneous identification of multiple driver pathways de novo in somatic mutation data from a cohort of cancer samples. The algorithm relies on two combinatorial properties of mutations in a driver pathway: high coverage and mutual exclusivity. We derive an integer linear program that finds set of mutations exhibiting these properties. We apply Multi-Dendrix to somatic mutations from glioblastoma, breast cancer, and lung cancer samples. Multi-Dendrix identifies sets of mutations in genes that overlap with known pathways – including Rb, p53, PI(3)K, and cell cycle pathways – and also novel sets of mutually exclusive mutations, including mutations in several transcription factors or other genes involved in transcriptional regulation. These sets are discovered directly from mutation data with no prior knowledge of pathways or gene interactions. We show that Multi-Dendrix outperforms other algorithms for identifying combinations of mutations and is also orders of magnitude faster on genome-scale data. Software available at: http://compbio.cs.brown.edu/software. Cancer is a disease driven largely by the accumulation of somatic mutations during the lifetime of an individual. The declining costs of genome sequencing now permit the measurement of somatic mutations in hundreds of cancer genomes. A key challenge is to distinguish driver mutations responsible for cancer from random passenger mutations. This challenge is compounded by the observation that different combinations of driver mutations are observed in different patients with the same cancer type. One reason for this heterogeneity is that driver mutations target signaling and regulatory pathways which have multiple points of failure. We introduce an algorithm, Multi-Dendrix, to find these pathways solely from patterns of mutual exclusivity between mutations across a cohort of patients. Unlike earlier approaches, we simultaneously find multiple pathways, an essential feature for analyzing cancer genomes where multiple pathways are typically perturbed. We apply our algorithm to mutation data from hundreds of glioblastoma, breast cancer, and lung adenocarcinoma patients. We identify sets of interacting genes that overlap known pathways, and gene sets containing subtype-specific mutations. These results show that multiple cancer pathways can be identified directly from patterns in mutation data, and provide an approach to analyze the ever-growing cancer mutation datasets.