z-logo
open-access-imgOpen Access
Comparison of the BM25 and rabinkarp algorithm for plagiarism detection
Author(s) -
I Nyoman Saputra Wahyu Wijaya,
K A Seputra,
W G S Parwita
Publication year - 2021
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1810/1/012032
Subject(s) - plagiarism detection , computer science , hash function , process (computing) , indonesian , information retrieval , test (biology) , algorithm , programming language , linguistics , paleontology , philosophy , biology
Plagiarism occurs because of the easy distribution of data. Plagiarism detection of documents such as student assignments and final projects requires a long process, often overlooked. However, to avoid plagiarism, a document must be checked for the level of plagiarism. Plagiarism detection can be done online / offline with the plagiarism checker. However, checking documents with plagiarism checkers such as Turnitin, Dupli Checker, Copyleaks, PaperRater, Grammarly and others requires additional fees. Several studies have been conducted to detect plagiarism. BM25 and Rabin Karp are examples of the Plagiarism Checker method. BM25 is tfidf based, while Rabin Karp is Hashing based. Each method needs to know its performance to detect plagiarism. Based on these problems, a study on the comparison of plagiarism detection with the BM25 algorithm with Rabin-Karp will be conducted. The case study is to use the article in Indonesian. The application of the BM25 and Rabin Karp algorithms goes through the Pre-Processing stage which consists of case folding, cleaning, tokenizing, filtering, and stemming. In this study, Sastrawi stemmer was used in this study. The test was conducted on twenty articles in Indonesian. The test results that are seen are the performance in the form of execution time.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here