An integrated landscape of protein expression in human cancer

Open Access

23 April 2021

journal article
research article
Published by Springer Science and Business Media LLC in Scientific Data

Vol. 8 (1), 1-14
https://doi.org/10.1038/s41597-021-00890-2

Abstract

Using 11 proteomics datasets, mostly available through the PRIDE database, we assembled a reference expression map for 191 cancer cell lines and 246 clinical tumour samples, across 13 lineages. We found unique peptides identified only in tumour samples despite a much higher coverage in cell lines. These were mainly mapped to proteins related to regulation of signalling receptor activity. Correlations between baseline expression in cell lines and tumours were calculated. We found these to be highly similar across all samples with most similarity found within a given sample type. Integration of proteomics and transcriptomics data showed median correlation across cell lines to be 0.58 (range between 0.43 and 0.66). Additionally, in agreement with previous studies, variation in mRNA levels was often a poor predictor of changes in protein abundance. To our knowledge, this work constitutes the first meta-analysis focusing on cancer-related public proteomics datasets. We therefore also highlight shortcomings and limitations of such studies. All data is available through PRIDE dataset identifier PXD013455 and in Expression Atlas.

Keywords

Funding Information

European Bioinformatics Institute

This publication has 59 references indexed in Scilit:

Next-generation characterization of the Cancer Cell Line Encyclopedia
Nature, 2019
Multi-omic measurements of heterogeneity in HeLa cells across laboratories
Nature Biotechnology, 2019
Colorectal Cancer Cell Line Proteomes Are Representative of Primary Tumors and Predict Drug Sensitivity
Gastroenterology, 2017
On the Dependency of Cellular Protein Levels on mRNA Abundance
Cell, 2016
Proteomic maps of breast cancer subtypes
Nature Communications, 2016
The Proteomic Landscape of Triple-Negative Breast Cancer
Cell Reports, 2015
limma powers differential expression analyses for RNA-sequencing and microarray studies
Nucleic Acids Research, 2015
Reuse of public genome-wide gene expression data
Nature Reviews Genetics, 2012
REVIGO Summarizes and Visualizes Long Lists of Gene Ontology Terms
PLOS ONE, 2011
Assessment of tumor characteristic gene expression in cell lines using a tissue similarity index (TSI)
Proceedings of the National Academy of Sciences of the United States of America, 2005

Cited by 40 articles