Unsupervised feature selection using feature similarity
Top Cited Papers
- 7 August 2002
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in Ieee Transactions On Pattern Analysis and Machine Intelligence
- Vol. 24 (3), 301-312
- https://doi.org/10.1109/34.990133
Abstract
In this article, we describe an unsupervised feature selection algorithm suitable for data sets, large in both dimension and size. The method is based on measuring similarity between features whereby redundancy therein is removed. This does not need any search and, therefore, is fast. A new feature similarity measure, called maximum information compression index, is introduced. The algorithm is generic in nature and has the capability of multiscale representation of data sets. The superiority of the algorithm, in terms of speed and performance, is established extensively over various real-life data sets of different sizes and dimensions. It is also demonstrated how redundancy and information loss in feature selection can be quantified with an entropy measure.Keywords
This publication has 15 references indexed in Scilit:
- Maximum entropy and maximum likelihood criteria for feature selection from multivariate dataPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Unsupervised feature evaluation: a neuro-fuzzy approachIEEE Transactions on Neural Networks, 2000
- Data mining and knowledge discovery in databasesCommunications of the ACM, 1996
- A Practical Approach to Feature SelectionPublished by Elsevier BV ,1992
- Linear Statistical Inference and its ApplicationsWiley Series in Probability and Statistics, 1973
- Comments on "Feature Selection with a Linear Dependence MeasureIEEE Transactions on Computers, 1972
- Redundancy in Feature ExtractionIEEE Transactions on Computers, 1971
- Feature Selection with a Linear Dependence MeasureIEEE Transactions on Computers, 1971
- Step-Wise Clustering ProceduresJournal of the American Statistical Association, 1967
- TABLES FOR USE IN COMPARISONS WHOSE ACCURACY INVOLVES TWO VARIANCES, SEPARATELY ESTIMATEDBiometrika, 1949