In silico Estimates of Tissue Components in Surgical Samples Based on Expression Profiling Data
- 14 August 2010
- journal article
- Published by American Association for Cancer Research (AACR) in Cancer Research
- Vol. 70 (16), 6448-6455
- https://doi.org/10.1158/0008-5472.can-10-0021
Abstract
Tissue samples from many diseases have been used for gene expression profiling studies, but these samples often vary widely in the cell types they contain. Such variation could confound efforts to correlate expression with clinical parameters. In principle, the proportion of each major tissue component can be estimated from the profiling data and used to triage samples before studying correlations with disease parameters. Four large gene expression microarray data sets from prostate cancer, whose tissue components were estimated by pathologists, were used to test the performance of multivariate linear regression models for in silico prediction of major tissue components. Ten-fold cross-validation within each data set yielded average differences between the pathologists' predictions and the in silico predictions of 8% to 14% for the tumor component and 13% to 17% for the stroma component. Across independent data sets that used similar platforms and fresh frozen samples, the average differences were 11% to 12% for tumor and 12% to 17% for stroma. When the models were applied to 219 arrays of “tumor-enriched” samples in the literature, almost one quarter were predicted to have 30% or less tumor cells. Furthermore, there was a 10.5% difference in the average predicted tumor content between 37 recurrent and 42 nonrecurrent cancer patients. As a result, genes that correlated with tissue percentage generally also correlated with recurrence. If such a correlation is not desired, then some samples might be removed to rebalance the data set or tissue percentages might be incorporated into the prediction algorithm. A web service, “CellPred,” has been designed for the in silico prediction of sample tissue components based on expression data. Cancer Res; 70(16); 6448–55. ©2010 AACR.Other Versions
This publication has 15 references indexed in Scilit:
- WebArrayDB: cross-platform microarray data analysis and public data repositoryBioinformatics, 2009
- The wisdom of the commons: ensemble tree classifiers for prostate cancer prognosisBioinformatics, 2008
- Computational expression deconvolution in a complex mammalian organBMC Bioinformatics, 2006
- Integrative genomic and proteomic analysis of prostate cancer reveals signatures of metastatic progressionCancer Cell, 2005
- Integration of gene expression profiling and clinical variables to predict prostate carcinoma recurrence after radical prostatectomyCancer, 2005
- Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancerThe Lancet, 2005
- In silico dissection of cell-type-associated patterns of gene expression in prostate cancerProceedings of the National Academy of Sciences of the United States of America, 2004
- Repeated observation of breast tumor subtypes in independent gene expression data setsProceedings of the National Academy of Sciences of the United States of America, 2003
- A comparison of normalization methods for high density oligonucleotide array data based on variance and biasBioinformatics, 2003
- New technologies for biomarker analysis of prostate cancer progression: Laser capture microdissection and tissue proteomicsUrology, 2001