Robustness and applicability of transcription factor and pathway analysis tools on single-cell RNA-seq data
Open Access
- 12 February 2020
- journal article
- research article
- Published by Springer Science and Business Media LLC in Genome Biology
- Vol. 21 (1), 1-19
- https://doi.org/10.1186/s13059-020-1949-z
Abstract
Many functional analysis tools have been developed to extract functional and mechanistic insight from bulk transcriptome data. With the advent of single-cell RNA sequencing (scRNA-seq), it is in principle possible to do such an analysis for single cells. However, scRNA-seq data has characteristics such as drop-out events and low library sizes. It is thus not clear if functional TF and pathway analysis tools established for bulk sequencing can be applied to scRNA-seq in a meaningful way. To address this question, we perform benchmark studies on simulated and real scRNA-seq data. We include the bulk-RNA tools PROGENy, GO enrichment, and DoRothEA that estimate pathway and transcription factor (TF) activities, respectively, and compare them against the tools SCENIC/AUCell and metaVIPER, designed for scRNA-seq. For the in silico study, we simulate single cells from TF/pathway perturbation bulk RNA-seq experiments. We complement the simulated data with real scRNA-seq data upon CRISPR-mediated knock-out. Our benchmarks on simulated and real data reveal comparable performance to the original bulk data. Additionally, we show that the TF and pathway activities preserve cell type-specific variability by analyzing a mixture sample sequenced with 13 scRNA-seq protocols. We also provide the benchmark data for further use by the community. Our analyses suggest that bulk-based functional analysis tools that use manually curated footprint gene sets can be applied to scRNA-seq data, partially outperforming dedicated single-cell tools. Furthermore, we find that the performance of functional analysis tools is more sensitive to the gene sets than to the statistic used.Keywords
Funding Information
- Bundesministerium für Bildung und Forschung (FKZ: 031L0049)
- National Institutes of Health (U54-CA217377)
- Ministry of Science, Innovation and Universities (SAF2017-89109-P)
This publication has 48 references indexed in Scilit:
- Ten Years of Pathway Analysis: Current Approaches and Outstanding ChallengesPLoS Computational Biology, 2012
- Gene set enrichment analysis: performance evaluation and usage guidelinesBriefings in Bioinformatics, 2011
- Transcription factor Foxp1 exerts essential cell-intrinsic regulation of the quiescence of naive T cellsNature Immunology, 2011
- Discovering causal signaling pathways through gene-expression patternsNucleic Acids Research, 2010
- Transcription factor regulation can be accurately predicted from the presence of target gene signatures in microarray gene expression dataNucleic Acids Research, 2010
- edgeR: a Bioconductor package for differential expression analysis of digital gene expression dataBioinformatics, 2009
- ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular ContextBMC Bioinformatics, 2006
- Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profilesProceedings of the National Academy of Sciences of the United States of America, 2005
- Gene Expression Omnibus: NCBI gene expression and hybridization array data repositoryNucleic Acids Research, 2002
- Silhouettes: A graphical aid to the interpretation and validation of cluster analysisJournal of Computational and Applied Mathematics, 1987