Unsupervised feature selection using feature similarity

Top Cited Papers

7 August 2002

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in Ieee Transactions On Pattern Analysis and Machine Intelligence

Vol. 24 (3), 301-312
https://doi.org/10.1109/34.990133

Abstract

In this article, we describe an unsupervised feature selection algorithm suitable for data sets, large in both dimension and size. The method is based on measuring similarity between features whereby redundancy therein is removed. This does not need any search and, therefore, is fast. A new feature similarity measure, called maximum information compression index, is introduced. The algorithm is generic in nature and has the capability of multiscale representation of data sets. The superiority of the algorithm, in terms of speed and performance, is established extensively over various real-life data sets of different sizes and dimensions. It is also demonstrated how redundancy and information loss in feature selection can be quantified with an entropy measure.

Keywords

This publication has 15 references indexed in Scilit:

Maximum entropy and maximum likelihood criteria for feature selection from multivariate data
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Unsupervised feature evaluation: a neuro-fuzzy approach
IEEE Transactions on Neural Networks, 2000
Data mining and knowledge discovery in databases
Communications of the ACM, 1996
A Practical Approach to Feature Selection
Published by Elsevier BV ,1992
Linear Statistical Inference and its Applications
Wiley Series in Probability and Statistics, 1973
Comments on "Feature Selection with a Linear Dependence Measure
IEEE Transactions on Computers, 1972
Redundancy in Feature Extraction
IEEE Transactions on Computers, 1971
Feature Selection with a Linear Dependence Measure
IEEE Transactions on Computers, 1971
Step-Wise Clustering Procedures
Journal of the American Statistical Association, 1967
TABLES FOR USE IN COMPARISONS WHOSE ACCURACY INVOLVES TWO VARIANCES, SEPARATELY ESTIMATED
Biometrika, 1949

Cited by 1101 articles