Dimension Reduction and Clustering Models for Single-Cell RNA Sequencing Data: A Comparative Study

Open Access

22 March 2020

journal article
research article
Published by MDPI AG in International Journal of Molecular Sciences

Vol. 21 (6), 2181
https://doi.org/10.3390/ijms21062181

Abstract

With recent advances in single-cell RNA sequencing, enormous transcriptome datasets have been generated. These datasets have furthered our understanding of cellular heterogeneity and its underlying mechanisms in homogeneous populations. Single-cell RNA sequencing (scRNA-seq) data clustering can group cells belonging to the same cell type based on patterns embedded in gene expression. However, scRNA-seq data are high-dimensional, noisy, and sparse, owing to the limitation of existing scRNA-seq technologies. Traditional clustering methods are not effective and efficient for high-dimensional and sparse matrix computations. Therefore, several dimension reduction methods have been introduced. To validate a reliable and standard research routine, we conducted a comprehensive review and evaluation of four classical dimension reduction methods and five clustering models. Four experiments were progressively performed on two large scRNA-seq datasets using 20 models. Results showed that the feature selection method contributed positively to high-dimensional and sparse scRNA-seq data. Moreover, feature-extraction methods were able to promote clustering performance, although this was not eternally immutable. Independent component analysis (ICA) performed well in those small compressed feature spaces, whereas principal component analysis was steadier than all the other feature-extraction methods. In addition, ICA was not ideal for fuzzy C-means clustering in scRNA-seq data analysis. K-means clustering was combined with feature-extraction methods to achieve good results.

Keywords

Funding Information

The National Natural Science Foundation of China (61972174, 61602207 and 61572228)

This publication has 60 references indexed in Scilit:

Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells
Nature Structural & Molecular Biology, 2013
Genetic programs in human and mouse early embryos revealed by single-cell RNA sequencing
Nature, 2013
Community detection algorithms: A comparative analysis
Physical Review E, 2009
Fast unfolding of communities in large networks
Journal of Statistical Mechanics: Theory and Experiment, 2008
Top 10 algorithms in data mining
Knowledge and Information Systems, 2007
Pixel-based and region-based image fusion schemes using ICA bases
Information Fusion, 2007
Matrix Factorization Algorithms for the Identification of Muscle Synergies: Evaluation on Simulated and Experimental Data Sets
Journal of Neurophysiology, 2006
Community detection in complex networks using extremal optimization
Physical Review E, 2005
Metabolite fingerprinting: detecting biological features by independent component analysis
Bioinformatics, 2004
Nonlinear Dimensionality Reduction by Locally Linear Embedding
Science, 2000

Cited by 36 articles