Simple K-Medoids Partitioning Algorithm for Mixed Variable Data
Open Access
- 24 August 2019
- journal article
- research article
- Published by MDPI AG in Algorithms
- Vol. 12 (9), 177
- https://doi.org/10.3390/a12090177
Abstract
A simple and fast k-medoids algorithm that updates medoids by minimizing the total distance within clusters has been developed. Although it is simple and fast, as its name suggests, it nonetheless has neglected local optima and empty clusters that may arise. With the distance as an input to the algorithm, a generalized distance function is developed to increase the variation of the distances, especially for a mixed variable dataset. The variation of the distances is a crucial part of a partitioning algorithm due to different distances producing different outcomes. The experimental results of the simple k-medoids algorithm produce consistently good performances in various settings of mixed variable data. It also has a high cluster accuracy compared to other distance-based partitioning algorithms for mixed variable data.Keywords
This publication has 19 references indexed in Scilit:
- An improved k-prototypes clustering algorithm for mixed numeric and categorical dataNeurocomputing, 2013
- A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasetsPattern Recognition Letters, 2011
- A simple and fast algorithm for K-medoids clusteringExpert Systems with Applications, 2009
- Distance functions for categorical and mixed variablesPattern Recognition Letters, 2008
- Top 10 algorithms in data miningKnowledge and Information Systems, 2007
- A k-mean clustering algorithm for mixed numeric and categorical dataData & Knowledge Engineering, 2007
- Simultaneous clustering of gene expression data with clinical chemistry and pathological evaluations reveals phenotypic prototypesBMC Systems Biology, 2007
- Clustering Objects on Subsets of Attributes (with Discussion)Journal of the Royal Statistical Society Series B: Statistical Methodology, 2004
- Local Optima in K-Means Clustering: What You Don't Know May Hurt You.Psychological Methods, 2003
- A General Coefficient of Similarity and Some of Its PropertiesBiometrics, 1971