Comparison of clustering techniques for measuring similarity in articles
- 1 February 2017
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE) in 2017 3rd International Conference on Computational Intelligence & Communication Technology (CICT)
Abstract
Clustering groups the objects into clusters having similarity with each other. This paper focuses on the two techniques of clustering i.e. hierarchical clustering and k-means clustering. The research is to compare various similarities measuring methods and finding out the best one. Research work is started by selecting different categories of textual contents or articles. For each selected category, articles have been selected from various news channels. Search words are identified which are most relevant for a respective category. Now these words are used as input for processing in the program to create a matrix of words. This matrix is then processed in Matlab using different measuring methods. The final outcome is demonstrated by the Cophenatic correlation coefficient & Silhouette Value to find out the best method of similarity measure. In this paper, five categories have been selected for the analysis which are “Business”, “Education”, “Election”, “Entertainment” and “Game” and 28 news articles have been filtered out for each category from various news channels. Different numbers of words are selected like 35, 49, 25, 30 and 35 against the mentioned categories for the implementation of the proposed technique. The research work finally concludes that for hierarchical clustering - `Cityblock' and for k-means clustering - `Correlation' is the best method however cityblock is at second position in the k-means clustering.Keywords
This publication has 4 references indexed in Scilit:
- Clustering Techniques and the Similarity Measures used in Clustering: A SurveyInternational Journal of Computer Applications, 2016
- A Survey Of Hierarchical Clustering AlgorithmsJournal of Mathematics and Computer Science, 2012
- Fast Hierarchical Clustering Based on Compressed Data and OPTICSLecture Notes in Computer Science, 2000
- Automatic subspace clustering of high dimensional data for data mining applicationsACM SIGMOD Record, 1998