Joint and individual variation explained (JIVE) for integrated analysis of multiple data types
Open Access
- 1 March 2013
- journal article
- Published by Institute of Mathematical Statistics in The Annals of Applied Statistics
- Vol. 7 (1), 523-542
- https://doi.org/10.1214/12-aoas597
Abstract
Research in several fields now requires the analysis of data sets in which multiple high-dimensional types of data are available for a common set of objects. In particular, The Cancer Genome Atlas (TCGA) includes data from several diverse genomic technologies on the same cancerous tumor samples. In this paper we introduce Joint and Individual Variation Explained (JIVE), a general decomposition of variation for the integrated analysis of such data sets. The decomposition consists of three terms: a low-rank approximation capturing joint variation across data types, low-rank approximations for structured variation individual to each data type, and residual noise. JIVE quantifies the amount of joint variation between data types, reduces the dimensionality of the data and provides new directions for the visual exploration of joint and individual structures. The proposed method represents an extension of Principal Component Analysis and has clear advantages over popular two-block methods such as Canonical Correlation Analysis and Partial Least Squares. A JIVE analysis of gene expression and miRNA data on Glioblastoma Multiforme tumor samples reveals gene–miRNA associations and provides better characterization of tumor types. Data and software are available at https://genome.unc.edu/jive/.Keywords
Other Versions
This publication has 31 references indexed in Scilit:
- The 2011 Nucleic Acids Research Database Issue and the online Molecular Biology Database CollectionNucleic Acids Research, 2010
- SWISS MADE: Standardized WithIn Class Sum of Squares to Evaluate Methodologies and Dataset ElementsPLOS ONE, 2010
- Integrated Genomic Analysis Identifies Clinically Relevant Subtypes of Glioblastoma Characterized by Abnormalities in PDGFRA, IDH1, EGFR, and NF1Cancer Cell, 2010
- The UCSC Genome Browser database: update 2010Nucleic Acids Research, 2009
- Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysisBioinformatics, 2009
- ArrayExpress update--from an archive of functional genomics experiments to the atlas of gene expressionNucleic Acids Research, 2009
- Comprehensive genomic characterization defines human glioblastoma genes and core pathwaysNature, 2008
- Revealing the architecture of gene regulation: the promise of eQTL studiesTrends in Genetics, 2008
- The Human Connectome: A Structural Description of the Human BrainPLoS Computational Biology, 2005
- Estimating the Dimension of a ModelThe Annals of Statistics, 1978