Semi-supervised recursively partitioned mixture models for identifying cancer subtypes
Open Access
- 16 August 2010
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 26 (20), 2578-2585
- https://doi.org/10.1093/bioinformatics/btq470
Abstract
Motivation: Patients with identical cancer diagnoses often progress differently. The disparity we see in disease progression and treatment response can be attributed to the idea that two histologically similar cancers may be completely different diseases on the molecular level. Methods for identifying cancer subtypes associated with patient survival have the capacity to be powerful instruments for understanding the biochemical processes that underlie disease progression as well as providing an initial step toward more personalized therapy for cancer patients. We propose a method called semi-supervised recursively partitioned mixture models (SS-RPMM) that utilizes array-based genetic and patient-level clinical data for finding cancer subtypes that are associated with patient survival. Results: In the proposed SS-RPMM, cancer subtypes are identified using a selected subset of genes that are associated with survival time. Since survival information is used in the gene selection step, this method is semi-supervised. Unlike other semi-supervised clustering classification methods, SS-RPMM does not require specification of the number of cancer subtypes, which is often unknown. In a simulation study, our proposed method compared favorably with other competing semi-supervised methods, including: semi-supervised clustering and supervised principal components analysis. Furthermore, an analysis of mesothelioma cancer data using SS-RPMM, revealed at least two distinct methylation profiles that are informative for survival. Availability: The analyses implemented in this article were carried out using R (http://www.r.project.org/). Contact:devin_koestler@brown.edu; e_andres_houseman@brown.edu Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 32 references indexed in Scilit:
- Comprehensive profiling of DNA methylation in colorectal cancer reveals subgroups with distinct clinicopathological and molecular featuresBMC Cancer, 2010
- Gene-specific and global methylation patterns predict outcome in patients with acute myeloid leukemiaLeukemia, 2010
- Aging and Environmental Exposures Alter Tissue-Specific DNA Methylation Dependent upon CpG Island ContextPLoS Genetics, 2009
- Differentiation of Lung Adenocarcinoma, Pleural Mesothelioma, and Nonmalignant Pulmonary Tissues Using DNA Methylation ProfilesCancer Research, 2009
- Epigenetic Profiles Distinguish Pleural Mesothelioma from Normal Pleura and Predict Lung Asbestos Burden and Clinical OutcomeCancer Research, 2008
- Use of Gene-Expression Profiling to Identify Prognostic Subclasses in Adult Acute Myeloid LeukemiaNew England Journal of Medicine, 2004
- Semi-Supervised Methods to Predict Patient Survival from Gene Expression DataPLoS Biology, 2004
- Gene-expression profiles predict survival of patients with lung adenocarcinomaNature Medicine, 2002
- Distinct types of diffuse large B-cell lymphoma identified by gene expression profilingNature, 2000
- Optimal Rate of Convergence for Finite Mixture ModelsThe Annals of Statistics, 1995