Fast and precise single-cell data analysis using a hierarchical autoencoder
Open Access
- 15 February 2021
- journal article
- research article
- Published by Springer Science and Business Media LLC in Nature Communications
- Vol. 12 (1), 1-10
- https://doi.org/10.1038/s41467-021-21312-2
Abstract
A primary challenge in single-cell RNA sequencing (scRNA-seq) studies comes from the massive amount of data and the excess noise level. To address this challenge, we introduce an analysis framework, named single-cell Decomposition using Hierarchical Autoencoder (scDHA), that reliably extracts representative information of each cell. The scDHA pipeline consists of two core modules. The first module is a non-negative kernel autoencoder able to remove genes or components that have insignificant contributions to the part-based representation of the data. The second module is a stacked Bayesian autoencoder that projects the data onto a low-dimensional space (compressed). To diminish the tendency to overfit of neural networks, we repeatedly perturb the compressed space to learn a more generalized representation of the data. In an extensive analysis, we demonstrate that scDHA outperforms state-of-the-art techniques in many research sub-fields of scRNA-seq analysis, including cell segregation through unsupervised learning, visualization of transcriptome landscape, cell classification, and pseudo-time inference.Keywords
Funding Information
- U.S. Department of Health & Human Services | NIH | National Institute of General Medical Sciences (GM103440)
- National Science Foundation (2001385, 2019609)
- National Aeronautics and Space Administration (80NSSC19M0170, NNX15AI02H)
This publication has 74 references indexed in Scilit:
- Single-cell RNA-seq: advances and future challengesNucleic Acids Research, 2014
- Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastomaScience, 2014
- The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cellsNature Biotechnology, 2014
- Single-Cell RNA-Seq Reveals Dynamic, Random Monoallelic Gene Expression in Mammalian CellsScience, 2014
- Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cellsNature Structural & Molecular Biology, 2013
- Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion mapsProceedings of the National Academy of Sciences of the United States of America, 2005
- Greedy function approximation: A gradient boosting machine.The Annals of Statistics, 2001
- A Global Geometric Framework for Nonlinear Dimensionality ReductionScience, 2000
- Silhouettes: A graphical aid to the interpretation and validation of cluster analysisJournal of Computational and Applied Mathematics, 1987
- Comparing partitionsJournal of Classification, 1985