KB-CB-N classification: Towards unsupervised approach for supervised learning

Abstract

Data classification has attracted considerable research attention in the field of computational statistics and data mining due to its wide range of applications. K Best Cluster Based Neighbour (KB-CB-N) is our novel classification technique based on the integration of three different similarity measures for cluster based classification. The basic principle is to apply unsupervised learning on the instances of each class in the dataset and then use the output as an input for the classification algorithm to find the K best neighbours of clusters from the density, gravity and distance perspectives. Clustering is applied as an initial step within each class to find the inherent in-class grouping in the dataset. Different data clustering techniques use different similarity measures. Each measure has its own strength and weakness. Thus, combining the three measures can benefit from the strength of each one and eliminate encountered problems of using an individual measure. Extensive experimental results using eight real datasets have evidenced that our new technique typically shows improved or equivalent performance over other existing state-of-the-art classification methods.

Keywords

This publication has 14 references indexed in Scilit:

Top 10 algorithms in data mining
Knowledge and Information Systems, 2007
Fast k-nearest neighbor classification using cluster-based trees
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004
Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification
Lecture Notes in Computer Science, 2001
Data clustering
ACM Computing Surveys, 1999
Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays
Proceedings of the National Academy of Sciences of the United States of America, 1999
Effective data mining using neural networks
IEEE Transactions on Knowledge and Data Engineering, 1996
A training algorithm for optimal margin classifiers
Published by Association for Computing Machinery (ACM) ,1992
Induction of decision trees
Machine Learning, 1986
Agglomerative clustering using the concept of mutual nearest neighbourhood
Pattern Recognition, 1978
Nearest neighbor pattern classification
IEEE Transactions on Information Theory, 1967

Cited by 7 articles