Application of Data Mining with the K-Means Clustering Method and Davies Bouldin Index for Grouping IMDB Movies

Abstract
Along with the development of technology, the film industry continues to increase, this can be seen from the number of films that appear both in cinemas and tv shows. The Internet Movie Database (IMDb) is a website that provides information about films from around the world, including the people involved in the films. Information contained on IMDB such as actor/actress, director, writer, to the soundtrack used. In addition, IMDb is the most popular and trusted source of information for movies, TV, and other celebrity content. In this case, the researcher will conduct research on the film with what title is the most popular among the public by looking at some of the parameters contained in IMDB such as the number on the rating, score, certificate, and votes obtained from the audience. The data used comes from the Kaggle.com website. The data mining method used is the K-Means clustering method. To find out the optimal cluster value, the Davies Bouldin index is used. The K-Means algorithm will group the data based on the centroid. The parameters used for clustering are runtime, IMDB rating, meta score, number of votes, and gross. The results of the study obtained that the average calculation of the highest attributes was 48.74 and the number of clusters formed was 4 clusters. The results of the evaluation using the confusion matrix obtained an accuracy value of 100%.