A comparative assessment on gene expression classification methods of RNA-seq data generated using next-generation sequencing (NGS)

Open Access

1 April 2022

journal article
Published by Narra T in Narra J

Vol. 2 (1)
https://doi.org/10.52225/narra.v2i1.60

Abstract

Next-generation sequencing or massively parallel sequencing have revolutionized genomic research. RNA sequencing (RNA-Seq) can profile the gene-expression used for molecular diagnosis, disease classification and providing potential markers of diseases. For classification of gene expressions, several methods that have been proposed are based on microarray data which is a continuous scale or require a normal distribution assumption. As the RNA-Seq data do not meet those requirements, these methods cannot be applied directly. In this study, we compare several classifiers including Logistic Regression, Support Vector Machine, Classification and Regression Trees and Random Forest. A simulation study with different parameters such as over dispersion, differential expression rate is conducted and the results are compared with two mRNA experimental datasets. To measure predictive accuracy six performance indicators are used: Percentage Correctly Classified, Area Under Receiver Operating Characteristic (ROC) Curve, Kolmogorov Smirnov Statistics, Partial Gini Index, H-measure and Brier Score. The result shows that Random Forest outperforms the other classification algorithms.

This publication has 15 references indexed in Scilit:

Next generation sequencing: implications in personalized medicine and pharmacogenomics
Molecular BioSystems, 2016
Improving the Prediction of Prostate Cancer Overall Survival by Supplementing Readily Available Clinical Data with Gene Expression Levels of IGFBP3 and F3 in Formalin-Fixed Paraffin Embedded Core Needle Biopsy Material
PLOS ONE, 2016
Integration of somatic mutation, expression and functional data reveals potential driver genes predictive of breast cancer survival
Bioinformatics, 2015
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
Genome Biology, 2014
Operator Dependent Choice of Prostate Cancer Biopsy Has Limited Impact on a Gene Signature Analysis for the Highly Expressed Genes IGFBP3 and F3 in Prostate Cancer Epithelial Cells
PLOS ONE, 2014
Microarrays, deep sequencing and the true measure of the transcriptome
BMC Biology, 2011
Differential expression analysis for sequence count data
Genome Biology, 2010
RNA-Seq: a revolutionary tool for transcriptomics
Nature Reviews Genetics, 2009
The Transcriptional Landscape of the Yeast Genome Defined by RNA Sequencing
Science, 2008
Ensemble machine learning on gene expression data for cancer classification.
2003