Identifying (Quasi) Equally Informative Subsets in Feature Selection Problems for Classification: A Max-Relevance Min-Redundancy Approach
- 6 July 2015
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Cybernetics
- Vol. 46 (6), 1424-1437
- https://doi.org/10.1109/tcyb.2015.2444435
Abstract
An emerging trend in feature selection is the development of two-objective algorithms that analyze the tradeoff between the number of features and the classification performance of the model built with these features. Since these two objectives are conflicting, a typical result stands in a set of Pareto-efficient subsets, each having a different cardinality and a corresponding discriminating power. However, this approach overlooks the fact that, for a given cardinality, there can be several subsets with similar information content. The study reported here addresses this problem, and introduces a novel multiobjective feature selection approach conceived to identify: 1) a subset that maximizes the performance of a given classifier and 2) a set of subsets that are quasi equally informative, i.e., have almost same classification performance, to the performance maximizing subset. The approach consists of a wrapper [Wrapper for Quasi Equally Informative Subset Selection (W-QEISS)] built on the formulation of a four-objective optimization problem, which is aimed at maximizing the accuracy of a classifier, minimizing the number of features, and optimizing two entropy-based measures of relevance and redundancy. This allows conducting the search in a larger space, thus enabling the wrapper to generate a large number of Pareto-efficient solutions. The algorithm is compared against the mRMR algorithm, a two-objective wrapper and a computationally efficient filter [Filter for Quasi Equally Informative Subset Selection (F-QEISS)] on 24 University of California, Irvine, (UCI) datasets including both binary and multiclass classification. Experimental results show that W-QEISS has the capability of evolving a rich and diverse set of Pareto-efficient solutions, and that their availability helps in: 1) studying the tradeoff between multiple measures of classification performance and 2) understanding the relative importance of each feature. The quasi equally informative subsets are identified at the cost of a marginal increase in the computational time thanks to the adoption of Borg Multiobjective Evolutionary Algorithm and Extreme Learning Machine as global optimization and learning algorithms, respectively.Keywords
Funding Information
- Singapore University of Technology and Design (SRG ESD 2012 033, SRG ESD 2013 061)
This publication has 36 references indexed in Scilit:
- A fast learning algorithm for multi-layer extreme learning machinePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2014
- Sparse Extreme Learning Machine for ClassificationIEEE Transactions on Cybernetics, 2014
- Particle Swarm Optimization for Feature Selection in Classification: A Multi-Objective ApproachIEEE Transactions on Cybernetics, 2012
- Extreme Learning Machine for Regression and Multiclass ClassificationIEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2012
- Multivariate Density Estimation and VisualizationPublished by Springer Science and Business Media LLC ,2011
- Extreme learning machines: a surveyInternational Journal of Machine Learning and Cybernetics, 2011
- A systematic analysis of performance measures for classification tasksInformation Processing & Management, 2009
- Selection of input variables for data driven models: An average shifted histogram partial mutual information estimator approachJournal of Hydrology, 2009
- Extremely randomized treesMachine Learning, 2006
- The Relationship Between Variable Selection and Data Agumentation and a Method for PredictionTechnometrics, 1974