Grid Search-Based Hyperparameter Tuning and Classification of Microarray Cancer Data
- 1 February 2019
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
Cancer is a group of diseases caused due to abnormal cell growth. Due to the innovation of microarray technology, a large variety of microarray cancer datasets are produced and hence open up avenues to carry out research work across several disciplines such as Statistics, Computational Biology, Genomic studies and other related fields. The main challenges in analyzing microarray cancer data are the curse of dimensionality, small sample size, noisy data, and imbalance class problem. In this work, we are proposing grid search-based hyperparameter tuning (GSHPT) for random forest parameters to classify Microarray Cancer Data. A grid search is designed by a set of fixed parameter values which are essential in providing optimal accuracy on the basis of n-fold cross-validation. In our work, the 10-fold cross validation is considered. The grid search algorithm provides best parameters such as the number of features to consider at each split, number of trees in the forest, the maximum depth of the tree and the minimum number of samples required to be split at the leaf node. The maximum number of trees considered are 10, 20 and 70 respectively for Ovarian, 3-class Leukemia, and 3-class Leukemia cancer data. In the case of MLL and SRBCT, 50 trees are generated to achieve the maximum classification accuracy. The Gini index is employed as criteria to split the nodes and the maximum depth of the tree is set to 2 for all datasets. Experimental results of the proposed work show an improvement over the state of the art methods. The performance of the proposed method is evaluated using standard metrics such as classification accuracy, precision, recall, f1-score, confusion matrix and misclassification rate and comparative analysis is performed and the results are provided to reveal the performance of the proposed method.Keywords
This publication has 21 references indexed in Scilit:
- Pipelining the ranking techniques for microarray data classification: A case studyApplied Soft Computing, 2016
- A multi-objective heuristic algorithm for gene expression microarray data classificationExpert Systems with Applications, 2016
- A fuzzy based feature selection from independent component subspace for machine learning classification of microarray dataGenomics Data, 2016
- Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classificationExpert Systems with Applications, 2015
- Hidden Markov models for cancer classification using gene expression profilesInformation Sciences, 2015
- Hybrid $k$ -Nearest Neighbor ClassifierIEEE Transactions on Cybernetics, 2015
- Mapping microarray gene expression data into dissimilarity spaces for tumor classificationInformation Sciences, 2015
- Applying particle swarm optimization-based decision tree classifier for cancer classification on gene expression dataApplied Soft Computing, 2014
- Gene selection from microarray gene expression data for classification of cancer subgroups employing PSO and adaptive K-nearest neighborhood techniqueExpert Systems with Applications, 2014
- A fuzzy intelligent approach to the classification problem in gene expression data analysisKnowledge-Based Systems, 2012