Addressing Optimisation Challenges for Datasets with Many Variables, Using Genetic Algorithms to Implement Feature Selection
Open Access
- 28 March 2022
- journal article
- Published by IntechOpen in AI, Computer Science and Robotics Technology
- Vol. 2022, 1-21
- https://doi.org/10.5772/acrt.01
Abstract
This article provides an optimisation method using a Genetic Algorithm approach to apply feature selection techniques for large data sets to improve accuracy. This is achieved through improved classification, a reduced number of features, and furthermore it aids in interpreting the model. A clinical dataset, based on heart failure, is used to illustrate the nature of the problem and to show the effectiveness of the techniques developed. Clinical datasets are sometimes characterised as having many variables. For instance, blood biochemistry data has more than 60 variables that have led to complexities in developing predictions of outcomes using machine-learning and other algorithms. Hence, techniques to make them more tractable are required. Genetic Algorithms can provide an efficient and low numerically complex method for effectively selecting features. In this paper, a way to estimate the number of required variables is presented, and a genetic algorithm is used in a “wrapper” form to select features for a case study of heart failure data. Additionally, different initial populations and termination conditions are used to arrive at a set of optimal features, and these are then compared with the features obtained using traditional methodologies. The paper provides a framework for estimating the number of variables and generations required for a suitable solution.Keywords
This publication has 34 references indexed in Scilit:
- A review of feature selection methods on synthetic dataKnowledge and Information Systems, 2012
- Correlation-based Attribute Selection using Genetic AlgorithmInternational Journal of Computer Applications, 2010
- The parameter-less genetic algorithm in practiceInformation Sciences, 2004
- Dimensionality reduction using genetic algorithmsIEEE Transactions on Evolutionary Computation, 2000
- Convergence Criteria for Genetic AlgorithmsSIAM Journal on Computing, 2000
- Statistical pattern recognition: a reviewIEEE Transactions on Pattern Analysis and Machine Intelligence, 2000
- Feature selection for classificationIntelligent Data Analysis, 1997
- A Markov Chain analysis of genetic algorithms with power of 2 cardinality alphabetsEuropean Journal of Operational Research, 1997
- Genetic algorithms as a strategy for feature selectionJournal of Chemometrics, 1992
- A note on genetic algorithms for large-scale feature selectionPattern Recognition Letters, 1989