Benchmarking joint multi-omics dimensionality reduction approaches for the study of cancer
Top Cited Papers
Open Access
- 5 January 2021
- journal article
- research article
- Published by Springer Science and Business Media LLC in Nature Communications
- Vol. 12 (1), 1-12
- https://doi.org/10.1038/s41467-020-20430-7
Abstract
High-dimensional multi-omics data are now standard in biology. They can greatly enhance our understanding of biological systems when effectively integrated. To achieve proper integration, joint Dimensionality Reduction (jDR) methods are among the most efficient approaches. However, several jDR methods are available, urging the need for a comprehensive benchmark with practical guidelines. We perform a systematic evaluation of nine representative jDR methods using three complementary benchmarks. First, we evaluate their performances in retrieving ground-truth sample clustering from simulated multi-omics datasets. Second, we use TCGA cancer data to assess their strengths in predicting survival, clinical annotations and known pathways/biological processes. Finally, we assess their classification of multi-omics single-cell data. From these in-depth comparisons, we observe that intNMF performs best in clustering, while MCIA offers an effective behavior across many contexts. The code developed for this benchmark study is implemented in a Jupyter notebook—multi-omics mix (momix)—to foster reproducibility, and support users and future developers.This publication has 38 references indexed in Scilit:
- Similarity network fusion for aggregating data types on a genomic scaleNature Methods, 2014
- The Cancer Genome Atlas Pan-Cancer analysis projectNature Genetics, 2013
- Bayesian consensus clusteringBioinformatics, 2013
- Joint and individual variation explained (JIVE) for integrated analysis of multiple data typesThe Annals of Applied Statistics, 2013
- Comprehensive molecular portraits of human breast tumoursNature, 2012
- Molecular signatures database (MSigDB) 3.0Bioinformatics, 2011
- Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysisBioinformatics, 2010
- Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysisBioinformatics, 2009
- Supervised Risk Predictor of Breast Cancer Based on Intrinsic SubtypesJournal of Clinical Oncology, 2009
- Specific morphological features predictive for the basal phenotype in grade 3 invasive ductal carcinoma of breastHistopathology, 2006