Performance efficiency in plagiarism indication detection system using indexing method with data structure 23 tree
- 1 May 2014
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE) in 2014 2nd International Conference on Information and Communication Technology (ICoICT)
Abstract
Plagiarism is a form of cheating that has been so much happen. One of prevention is to make the anti-plagiarism system. The system that must compare a query document with all documents in the database requires a very long time. The more irrelevant document in database compare with the query that will be matched will waste the time. This paper will discuss a system to detect plagiarism by using indexing method as a way to eliminate irrelevant documents in order to reduce the document database that will be matched with the query document. Matching between a query document and documents in database will be done with Longest Common Subsequence (LCS) algorithm. The system will use inverted index as the form to eliminate irrelevant documents using a 2-3 tree data structure. Indexing is done by inserting the fingerprint of the document. To find the fingerprint this paper will use winnowing algorithm. The results of the system shows to execute 1 query and 10000 documents corpus, most of them are not relevant, takes 59 seconds and 134 seconds with and without respectively. The f-measure value, the average value of precision and recall, is obtained 0.7387 by indexing with 0.15 as the threshold of indexing elimination and 0.000428 without indexing.Keywords
This publication has 9 references indexed in Scilit:
- Inverted indexes for phrases and stringsPublished by Association for Computing Machinery (ACM) ,2011
- Introduction to Information RetrievalPublished by Cambridge University Press (CUP) ,2008
- A New Efficient Algorithm for Computing the Longest Common SubsequenceTheory of Computing Systems, 2008
- Fast Plagiarism Detection SystemLecture Notes in Computer Science, 2005
- WinnowingPublished by Association for Computing Machinery (ACM) ,2003
- Plagiarism in programming assignmentsIEEE Transactions on Education, 1999
- Efficient randomized pattern-matching algorithmsIBM Journal of Research and Development, 1987
- Algorithms for the Longest Common Subsequence ProblemJournal of the ACM, 1977
- Organization and maintenance of large ordered indexesActa Informatica, 1972