Preprocessing kNN algorithm classification using K-means and distance matrix with students’ academic performance dataset
Published: 21 October 2020
Jurnal Teknologi dan Sistem Komputer , Volume 8, pp 311-316; doi:10.14710/jtsiskom.2020.13874
Abstract: The existence of outliers in the dataset can cause low accuracy in a classification process. Outliers in the dataset can be removed from a preprocessing stage of classification algorithms. Clustering can be used as an outlier detection method. This study applies K-means and a distance matrix to detect outliers and remove them from datasets with class labels. This research used a dataset of students’ academic performance totaling 6847 instances, having 18 attributes and 3 class labels. Preprocessing applies the K-means method to get centroid in each class. The distance matrix is used to evaluate the distance of instance to the centroid. Outliers, which are a different class, will be removed from the dataset. This preprocessing improves the classification accuracy of the kNN algorithm. Data without preprocessing has 72.28 % accuracy, preprocessed data using K-means with Euclidean has 98.42 % accuracy (an increase of 26.14 %), while the K-means with Manhattan has 97.76 % accuracy (an increase of 25.48 %).
Keywords: algorithm / outliers / distance / preprocessing / Matrix / centroid / K means / instance / accuracy in a classification
Scifeed alert for new publicationsNever miss any articles matching your research from any publisher
- Get alerts for new papers matching your research
- Find out the new papers from selected authors
- Updated daily for 49'000+ journals and 6000+ publishers
- Define your Scifeed now
Click here to see the statistics on "Jurnal Teknologi dan Sistem Komputer" .