Identifying (Quasi) Equally Informative Subsets in Feature Selection Problems for Classification: A Max-Relevance Min-Redundancy Approach

6 July 2015

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Cybernetics

Vol. 46 (6), 1424-1437
https://doi.org/10.1109/tcyb.2015.2444435

Abstract

An emerging trend in feature selection is the development of two-objective algorithms that analyze the tradeoff between the number of features and the classification performance of the model built with these features. Since these two objectives are conflicting, a typical result stands in a set of Pareto-efficient subsets, each having a different cardinality and a corresponding discriminating power. However, this approach overlooks the fact that, for a given cardinality, there can be several subsets with similar information content. The study reported here addresses this problem, and introduces a novel multiobjective feature selection approach conceived to identify: 1) a subset that maximizes the performance of a given classifier and 2) a set of subsets that are quasi equally informative, i.e., have almost same classification performance, to the performance maximizing subset. The approach consists of a wrapper [Wrapper for Quasi Equally Informative Subset Selection (W-QEISS)] built on the formulation of a four-objective optimization problem, which is aimed at maximizing the accuracy of a classifier, minimizing the number of features, and optimizing two entropy-based measures of relevance and redundancy. This allows conducting the search in a larger space, thus enabling the wrapper to generate a large number of Pareto-efficient solutions. The algorithm is compared against the mRMR algorithm, a two-objective wrapper and a computationally efficient filter [Filter for Quasi Equally Informative Subset Selection (F-QEISS)] on 24 University of California, Irvine, (UCI) datasets including both binary and multiclass classification. Experimental results show that W-QEISS has the capability of evolving a rich and diverse set of Pareto-efficient solutions, and that their availability helps in: 1) studying the tradeoff between multiple measures of classification performance and 2) understanding the relative importance of each feature. The quasi equally informative subsets are identified at the cost of a marginal increase in the computational time thanks to the adoption of Borg Multiobjective Evolutionary Algorithm and Extreme Learning Machine as global optimization and learning algorithms, respectively.

Keywords

Funding Information

Singapore University of Technology and Design (SRG ESD 2012 033, SRG ESD 2013 061)

This publication has 36 references indexed in Scilit:

A fast learning algorithm for multi-layer extreme learning machine
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2014
Sparse Extreme Learning Machine for Classification
IEEE Transactions on Cybernetics, 2014
Particle Swarm Optimization for Feature Selection in Classification: A Multi-Objective Approach
IEEE Transactions on Cybernetics, 2012
Extreme Learning Machine for Regression and Multiclass Classification
IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2012
Multivariate Density Estimation and Visualization
Published by Springer Science and Business Media LLC ,2011
Extreme learning machines: a survey
International Journal of Machine Learning and Cybernetics, 2011
A systematic analysis of performance measures for classification tasks
Information Processing & Management, 2009
Selection of input variables for data driven models: An average shifted histogram partial mutual information estimator approach
Journal of Hydrology, 2009
Extremely randomized trees
Machine Learning, 2006
The Relationship Between Variable Selection and Data Agumentation and a Method for Prediction
Technometrics, 1974

Cited by 39 articles