Simple K-Medoids Partitioning Algorithm for Mixed Variable Data

Open Access

24 August 2019

journal article
research article
Published by MDPI AG in Algorithms

Vol. 12 (9), 177
https://doi.org/10.3390/a12090177

Abstract

A simple and fast k-medoids algorithm that updates medoids by minimizing the total distance within clusters has been developed. Although it is simple and fast, as its name suggests, it nonetheless has neglected local optima and empty clusters that may arise. With the distance as an input to the algorithm, a generalized distance function is developed to increase the variation of the distances, especially for a mixed variable dataset. The variation of the distances is a crucial part of a partitioning algorithm due to different distances producing different outcomes. The experimental results of the simple k-medoids algorithm produce consistently good performances in various settings of mixed variable data. It also has a high cluster accuracy compared to other distance-based partitioning algorithms for mixed variable data.

Keywords

This publication has 19 references indexed in Scilit:

An improved k-prototypes clustering algorithm for mixed numeric and categorical data
Neurocomputing, 2013
A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasets
Pattern Recognition Letters, 2011
A simple and fast algorithm for K-medoids clustering
Expert Systems with Applications, 2009
Distance functions for categorical and mixed variables
Pattern Recognition Letters, 2008
Top 10 algorithms in data mining
Knowledge and Information Systems, 2007
A k-mean clustering algorithm for mixed numeric and categorical data
Data & Knowledge Engineering, 2007
Simultaneous clustering of gene expression data with clinical chemistry and pathological evaluations reveals phenotypic prototypes
BMC Systems Biology, 2007
Clustering Objects on Subsets of Attributes (with Discussion)
Journal of the Royal Statistical Society Series B: Statistical Methodology, 2004
Local Optima in K-Means Clustering: What You Don't Know May Hurt You.
Psychological Methods, 2003
A General Coefficient of Similarity and Some of Its Properties
Biometrics, 1971

Cited by 33 articles