Comparison of Distance Methods in K-Means Algorithm for Determining Village Status in Bekasi District

Abstract
The Bekasi regency government reveals that there are around 21 slummy villages that are spread across seven sub-districts in Bekasi Regency. This indicates the need for funding and development assistance. Regarding to the village development, the government and regional government must provide information about which villages should be prioritized for development. The Village Potential or “Potensi Desa” statistics dataset in 2014 (Podes 2014) at Bekasi regency are released by the Central Bureau of Statistics in the form of unsupervised data consisting of 182 villages and 41 indicators. The Podes 2014 data is collected based on village specific levels in Indonesia by making the village a unit of analysis. By using the k-means algorithm, village status can be determined in Bekasi Regency. The data clustering using k-means is done by calculating the closest distance from a data to a centroid point. This study comparison of distance calculation methods on k-means using Manhattan, Euclidean and Chebychev will be made. Tests will be carried out using Davies Bouldin index and execution time. Based on the tests result, Euclidean metric has most optimum value of Davies Index and efficient execution time compared to Manhattan and Chebyshev metrics.

This publication has 7 references indexed in Scilit: