Retrieval of source documents in a text reuse system
Open Access
- 13 March 2020
- journal article
- Published by Institute of Research and Community Services Diponegoro University (LPPM UNDIP) in Jurnal Teknologi dan Sistem Komputer
- Vol. 8 (2), 140-149
- https://doi.org/10.14710/jtsiskom.8.2.2020.140-149
Abstract
The architecture of the text-reuse detection system consists of three main modules, i.e., source retrieval, text analysis, and knowledge-based postprocessing. Each module plays an important role in the accuracy rate of the detection outputs. Therefore, this research focuses on developing the source retrieval system in cases where the source documents have been obfuscated in different levels. Two steps of term weighting were applied to get such documents. The first was the local-word weighting, which has been applied to the test or reused documents to select query per text segments. The tf-idf term weighting was applied for indexing all documents in the corpus and as the basis for computing cosine similarity between the queries per segment and the documents in the corpus. A two-step filtering technique was applied to get the source document candidates. Using artificial cases of text reuse testing, the system achieves the same rates of precision and recall that are 0.967, while the recall rate for the simulated cases of reused text is 0.66.Keywords
Funding Information
- Universitas Kristen Duta Wacana
This publication has 7 references indexed in Scilit:
- Web Scraping and Winnowing Algorithms for Plagiarism Detection of Final Project TitlesLontar Komputer : Jurnal Ilmiah Teknologi Informasi, 2019
- Detailed Analysis of Extrinsic Plagiarism Detection System Using Machine Learning Approach (Naive Bayes and SVM)TELKOMNIKA Indonesian Journal of Electrical Engineering, 2014
- Performance efficiency in plagiarism indication detection system using indexing method with data structure 23 treePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2014
- Citation-based Plagiarism DetectionPublished by Springer Science and Business Media LLC ,2014
- Plagiarism Detection for Indonesian TextsPublished by Association for Computing Machinery (ACM) ,2013
- A Novel Method of Significant Words Identification in Text SummarizationJournal of Emerging Technologies in Web Intelligence, 2012
- METERPublished by Association for Computational Linguistics (ACL) ,2001