Multivariate regression analysis of distance matrices for testing associations between gene expression patterns and related variables

19 December 2006

journal article
Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences of the United States of America

Vol. 103 (51), 19430-19435
https://doi.org/10.1073/pnas.0609333103

Abstract

A fundamental step in the analysis of gene expression and other high-dimensional genomic data is the calculation of the similarity or distance between pairs of individual samples in a study. If one has collected N total samples and assayed the expression level of G genes on those samples, then an N x N similarity matrix can be formed that reflects the correlation or similarity of the samples with respect to the expression values over the G genes. This matrix can then be examined for patterns via standard data reduction and cluster analysis techniques. We consider an alternative to conventional data reduction and cluster analyses of similarity matrices that is rooted in traditional linear models. This analysis method allows predictor variables collected on the samples to be related to variation in the pairwise similarity/distance values reflected in the matrix. The proposed multivariate method avoids the need for reducing the dimensions of a similarity matrix, can be used to assess relationships between the genes used to construct the matrix and additional information collected on the samples under study, and can be used to analyze individual genes or groups of genes identified in different ways. The technique can be used with any high-dimensional assay or data type and is ideally suited for testing subsets of genes defined by their participation in a biochemical pathway or other a priori grouping. We showcase the methodology using three published gene expression data sets.

This publication has 40 references indexed in Scilit:

Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles
Proceedings of the National Academy of Sciences of the United States of America, 2005
Association of cyclin-dependent kinase 5 and neuronal activators p35 and p39 complex in early-onset Alzheimer's disease
Neurobiology of Aging, 2005
Molecular Property eXplorer: A Novel Approach to Visualizing SAR Using Tree-Maps and Heatmaps
Journal of Chemical Information and Modeling, 2005
Testing association of a pathway with survival using gene expression data
Bioinformatics, 2005
A Transcriptional Profile of Aging in the Human Kidney
PLoS Biology, 2004
Relational patterns of gene expression via non-metric multidimensional scaling analysis
Bioinformatics, 2004
Statistical significance for genomewide studies
Proceedings of the National Academy of Sciences of the United States of America, 2003
From patterns to pathways: gene expression data analysis comes of age
Nature Genetics, 2002
Cannabinoids Protect Astrocytes from Ceramide-induced Apoptosis through the Phosphatidylinositol 3-Kinase/Protein Kinase B Pathway
Online Journal of Public Health Informatics, 2002
Metric and Euclidean properties of dissimilarity coefficients
Journal of Classification, 1986

Cited by 221 articles