A Higher-Order Generalized Singular Value Decomposition for Comparison of Global mRNA Expression from Multiple Organisms
Open Access
- 22 December 2011
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLOS ONE
- Vol. 6 (12), e28072
- https://doi.org/10.1371/journal.pone.0028072
Abstract
The number of high-dimensional datasets recording multiple aspects of a single phenomenon is increasing in many areas of science, accompanied by a need for mathematical frameworks that can compare multiple large-scale matrices with different row dimensions. The only such framework to date, the generalized singular value decomposition (GSVD), is limited to two matrices. We mathematically define a higher-order GSVD (HO GSVD) for N≥2 matrices , each with full column rank. Each matrix is exactly factored as Di = UiΣiVT, where V, identical in all factorizations, is obtained from the eigensystem SV = VΛ of the arithmetic mean S of all pairwise quotients of the matrices , i≠j. We prove that this decomposition extends to higher orders almost all of the mathematical properties of the GSVD. The matrix S is nondefective with V and Λ real. Its eigenvalues satisfy λk≥1. Equality holds if and only if the corresponding eigenvector vk is a right basis vector of equal significance in all matrices Di and Dj, that is σi,k/σj,k = 1 for all i and j, and the corresponding left basis vector ui,k is orthogonal to all other vectors in Ui for all i. The eigenvalues λk = 1, therefore, define the “common HO GSVD subspace.” We illustrate the HO GSVD with a comparison of genome-scale cell-cycle mRNA expression from S. pombe, S. cerevisiae and human. Unlike existing algorithms, a mapping among the genes of these disparate organisms is not required. We find that the approximately common HO GSVD subspace represents the cell-cycle mRNA expression oscillations, which are similar among the datasets. Simultaneous reconstruction in the common subspace, therefore, removes the experimental artifacts, which are dissimilar, from the datasets. In the simultaneous sequence-independent classification of the genes of the three organisms in this common subspace, genes of highly conserved sequences but significantly different cell-cycle peak times are correctly classified.Keywords
This publication has 36 references indexed in Scilit:
- Cyclebase.org: version 2.0, an updated comprehensive, multi-species repository of cell cycle experiments and derived analysis resultsNucleic Acids Research, 2009
- Cross species analysis of microarray expression dataBioinformatics, 2009
- Putative role for ABC multidrug exporters in yeast quorum sensingFEBS Letters, 2009
- Global effects of DNA replication and DNA replication origin activity on eukaryotic gene expressionMolecular Systems Biology, 2009
- A tensor higher-order singular value decomposition for integrative analysis of DNA microarray data from different studiesProceedings of the National Academy of Sciences of the United States of America, 2007
- NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteinsNucleic Acids Research, 2007
- Discovery of principles of nature from mathematical modeling of DNA microarray dataProceedings of the National Academy of Sciences of the United States of America, 2006
- The Cell Cycle–Regulated Genes of Schizosaccharomyces pombePLoS Biology, 2005
- The ATP‐binding cassette multidrug transporter Snq2 of Saccharomyces cerevisiae: a novel target for the transcription factors Pdr1 and Pdr3Molecular Microbiology, 1996
- Basic local alignment search toolJournal of Molecular Biology, 1990