Implementation of Sample Sample Bootstrapping for Resampling Pap Smear Single Cell Dataset

Abstract
  The purpose of this study was to determine how the effect of using Bootstrapping Samples for resampling the Harlev dataset in improving the performance of single-cell pap smear classification by dealing with the data imbalance problem. The Harlev dataset used in this study consists of 917 data with 20 attributes. The number of classes on the label had data imbalance in the dataset that affected single-cell pap smear classification performance. The data imbalance in the classification causes machine learning algorithms to produce poor performance in the minority class because they were overwhelmed by the majority class. To overcome it, The resampling data could be used with Sample Bootstrapping. The results of the Sample Bootstrapping were evaluated using the Artificial Neural Network and K-Nearest Neighbors classification methods. The classification used was seven classes and two classes. The classification results using these two methods showed an increase in accuracy, precision, and recall values. The performance improvement reached 10.82% for the two classes classification and 35% for the seven classes classification. It was concluded that Sample Boostrapping was good and robust in improving the classification method.