
Selection of the Best K-Gram Value on Modified Rabin-Karp Algorithm
Author(s) -
Wahyu Hidayat,
Ema Utami,
Andi Sunyoto
Publication year - 2022
Publication title -
indonesian journal of computing and cybernetics systems
Language(s) - English
Resource type - Journals
eISSN - 2460-7258
pISSN - 1978-1520
DOI - 10.22146/ijccs.63686
Subject(s) - similarity (geometry) , value (mathematics) , gram , n gram , hash function , group (periodic table) , statistics , selection (genetic algorithm) , computer science , algorithm , mathematics , combinatorics , natural language processing , artificial intelligence , biology , chemistry , computer security , organic chemistry , bacteria , language model , image (mathematics) , genetics
The Rabin-Karp algorithm is used to detect similarity using hashing techniques, from related studies modifications have been made in the hashing process but in previous studies have not been conducted research for the best k value in the K-Gram process. At the stage of stemming the Nazief & Adriani algorithm is used to transform the words into basic words. The researcher uses several variations of K-Gram values to determine the best K-Gram values. The analysis was performed using Ukara Enhanced public data obtained from the Kaggle with a total of 12215 data. The student essay answers data totaled to 258 data in the group A and 305 in the group B, every student essay answers data in each group will be compared with the answers of other fellow group member. Research results are the value of k = 3 has the best performance which has the highest some interpretations of 1-14% (Little degree of similarity) and 15-50% (Medium level of similarity) compared to values of k = 5, 7, and 9 which have the highest number of interpretation results 0%-0.99% (Document is different). However, if the students essay answers compared have 100% (Exactly the same) interpretations, the k value on K-Gram does not affect the results.