Web Scraping and Winnowing Algorithms for Plagiarism Detection of Final Project Titles

Abstract
Plagiarism in research can occur due to accident or intentional. Plagiarism is an act that violates copyright and includes actions that harm others. In submitting the title of the research, for example, for the final assignment research, not a few students who repeatedly submitted titles were rejected and considered doing plagiarism because the title proposed had already existed before. Then we need a system that can detect the similarity between the titles to be submitted and the existing titles so that it is expected to reduce the occurrence of plagiarism. This study uses a winnowing algorithm to find the percentage similarity between titles. The Google Scholar will be used to obtain data on research titles that have been previously available as comparison titles. Web scraping with CURL (Client URLs) and simple HTML DOM parser is used to retrieve title data from Google Scholar. The results of the study with the application of a Winnowing algorithm to find the percentage similarity to data from Google Scholar were able to present a percentage of similarities in percent with the category of mild, moderate or severe plagiarism, while also helping early detection as prevention of plagiarism.