Dimension Reduction and Clustering Models for Single-Cell RNA Sequencing Data: A Comparative Study
Open Access
- 22 March 2020
- journal article
- research article
- Published by MDPI AG in International Journal of Molecular Sciences
- Vol. 21 (6), 2181
- https://doi.org/10.3390/ijms21062181
Abstract
With recent advances in single-cell RNA sequencing, enormous transcriptome datasets have been generated. These datasets have furthered our understanding of cellular heterogeneity and its underlying mechanisms in homogeneous populations. Single-cell RNA sequencing (scRNA-seq) data clustering can group cells belonging to the same cell type based on patterns embedded in gene expression. However, scRNA-seq data are high-dimensional, noisy, and sparse, owing to the limitation of existing scRNA-seq technologies. Traditional clustering methods are not effective and efficient for high-dimensional and sparse matrix computations. Therefore, several dimension reduction methods have been introduced. To validate a reliable and standard research routine, we conducted a comprehensive review and evaluation of four classical dimension reduction methods and five clustering models. Four experiments were progressively performed on two large scRNA-seq datasets using 20 models. Results showed that the feature selection method contributed positively to high-dimensional and sparse scRNA-seq data. Moreover, feature-extraction methods were able to promote clustering performance, although this was not eternally immutable. Independent component analysis (ICA) performed well in those small compressed feature spaces, whereas principal component analysis was steadier than all the other feature-extraction methods. In addition, ICA was not ideal for fuzzy C-means clustering in scRNA-seq data analysis. K-means clustering was combined with feature-extraction methods to achieve good results.Keywords
Funding Information
- The National Natural Science Foundation of China (61972174, 61602207 and 61572228)
This publication has 60 references indexed in Scilit:
- Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cellsNature Structural & Molecular Biology, 2013
- Genetic programs in human and mouse early embryos revealed by single-cell RNA sequencingNature, 2013
- Community detection algorithms: A comparative analysisPhysical Review E, 2009
- Fast unfolding of communities in large networksJournal of Statistical Mechanics: Theory and Experiment, 2008
- Top 10 algorithms in data miningKnowledge and Information Systems, 2007
- Pixel-based and region-based image fusion schemes using ICA basesInformation Fusion, 2007
- Matrix Factorization Algorithms for the Identification of Muscle Synergies: Evaluation on Simulated and Experimental Data SetsJournal of Neurophysiology, 2006
- Community detection in complex networks using extremal optimizationPhysical Review E, 2005
- Metabolite fingerprinting: detecting biological features by independent component analysisBioinformatics, 2004
- Nonlinear Dimensionality Reduction by Locally Linear EmbeddingScience, 2000