Grid Search-Based Hyperparameter Tuning and Classification of Microarray Cancer Data

1 February 2019

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

Abstract

Cancer is a group of diseases caused due to abnormal cell growth. Due to the innovation of microarray technology, a large variety of microarray cancer datasets are produced and hence open up avenues to carry out research work across several disciplines such as Statistics, Computational Biology, Genomic studies and other related fields. The main challenges in analyzing microarray cancer data are the curse of dimensionality, small sample size, noisy data, and imbalance class problem. In this work, we are proposing grid search-based hyperparameter tuning (GSHPT) for random forest parameters to classify Microarray Cancer Data. A grid search is designed by a set of fixed parameter values which are essential in providing optimal accuracy on the basis of n-fold cross-validation. In our work, the 10-fold cross validation is considered. The grid search algorithm provides best parameters such as the number of features to consider at each split, number of trees in the forest, the maximum depth of the tree and the minimum number of samples required to be split at the leaf node. The maximum number of trees considered are 10, 20 and 70 respectively for Ovarian, 3-class Leukemia, and 3-class Leukemia cancer data. In the case of MLL and SRBCT, 50 trees are generated to achieve the maximum classification accuracy. The Gini index is employed as criteria to split the nodes and the maximum depth of the tree is set to 2 for all datasets. Experimental results of the proposed work show an improvement over the state of the art methods. The performance of the proposed method is evaluated using standard metrics such as classification accuracy, precision, recall, f1-score, confusion matrix and misclassification rate and comparative analysis is performed and the results are provided to reveal the performance of the proposed method.

Keywords

This publication has 21 references indexed in Scilit:

Pipelining the ranking techniques for microarray data classification: A case study
Applied Soft Computing, 2016
A multi-objective heuristic algorithm for gene expression microarray data classification
Expert Systems with Applications, 2016
A fuzzy based feature selection from independent component subspace for machine learning classification of microarray data
Genomics Data, 2016
Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classification
Expert Systems with Applications, 2015
Hidden Markov models for cancer classification using gene expression profiles
Information Sciences, 2015
Hybrid $k$ -Nearest Neighbor Classifier
IEEE Transactions on Cybernetics, 2015
Mapping microarray gene expression data into dissimilarity spaces for tumor classification
Information Sciences, 2015
Applying particle swarm optimization-based decision tree classifier for cancer classification on gene expression data
Applied Soft Computing, 2014
Gene selection from microarray gene expression data for classification of cancer subgroups employing PSO and adaptive K-nearest neighborhood technique
Expert Systems with Applications, 2014
A fuzzy intelligent approach to the classification problem in gene expression data analysis
Knowledge-Based Systems, 2012

Cited by 90 articles