A comparative assessment on gene expression classification methods of RNA-seq data generated using next-generation sequencing (NGS)
Open Access
- 1 April 2022
- Vol. 2 (1)
- https://doi.org/10.52225/narra.v2i1.60
Abstract
Next-generation sequencing or massively parallel sequencing have revolutionized genomic research. RNA sequencing (RNA-Seq) can profile the gene-expression used for molecular diagnosis, disease classification and providing potential markers of diseases. For classification of gene expressions, several methods that have been proposed are based on microarray data which is a continuous scale or require a normal distribution assumption. As the RNA-Seq data do not meet those requirements, these methods cannot be applied directly. In this study, we compare several classifiers including Logistic Regression, Support Vector Machine, Classification and Regression Trees and Random Forest. A simulation study with different parameters such as over dispersion, differential expression rate is conducted and the results are compared with two mRNA experimental datasets. To measure predictive accuracy six performance indicators are used: Percentage Correctly Classified, Area Under Receiver Operating Characteristic (ROC) Curve, Kolmogorov Smirnov Statistics, Partial Gini Index, H-measure and Brier Score. The result shows that Random Forest outperforms the other classification algorithms.This publication has 15 references indexed in Scilit:
- Next generation sequencing: implications in personalized medicine and pharmacogenomicsMolecular BioSystems, 2016
- Improving the Prediction of Prostate Cancer Overall Survival by Supplementing Readily Available Clinical Data with Gene Expression Levels of IGFBP3 and F3 in Formalin-Fixed Paraffin Embedded Core Needle Biopsy MaterialPLOS ONE, 2016
- Integration of somatic mutation, expression and functional data reveals potential driver genes predictive of breast cancer survivalBioinformatics, 2015
- Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2Genome Biology, 2014
- Operator Dependent Choice of Prostate Cancer Biopsy Has Limited Impact on a Gene Signature Analysis for the Highly Expressed Genes IGFBP3 and F3 in Prostate Cancer Epithelial CellsPLOS ONE, 2014
- Microarrays, deep sequencing and the true measure of the transcriptomeBMC Biology, 2011
- Differential expression analysis for sequence count dataGenome Biology, 2010
- RNA-Seq: a revolutionary tool for transcriptomicsNature Reviews Genetics, 2009
- The Transcriptional Landscape of the Yeast Genome Defined by RNA SequencingScience, 2008
- Ensemble machine learning on gene expression data for cancer classification.2003