
Detection of Text Similarity for Indication Plagiarism Using Winnowing Algorithm Based K-gram and Jaccard Coefficient
Author(s) -
Eva Yulia Puspaningrum,
Budi Nugroho,
Ariyono Setiawan,
Nuraini Hariyanti
Publication year - 2020
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1569/2/022044
Subject(s) - jaccard index , similarity (geometry) , gram , fingerprint (computing) , n gram , winnowing , value (mathematics) , hash function , computer science , mathematics , statistics , pattern recognition (psychology) , artificial intelligence , engineering , mechanical engineering , computer security , biology , bacteria , language model , image (mathematics) , genetics
One of the digital data is a document. Documents can be easily copied and deleted. Anyone can retype or copy parts of the document. In this paper will detect text similarity. The more similarity of words there is the more indicated the document is plagiarism. Winnowing algorithm performs the calculation of hash values of each k-gram. This method improves the search time with more accuracy in the detection process. All data selected hash values will be fingerprints of a document. Fingerprint will be used as a basis for comparing similarities between text data. The fingerprint value of the winnowing process for each document will be matched by using the Jaccard Coefficient to measure the similarity of the text. In this paper results show that the adjustment of the k-gram and window values can affect the final result of the similarity percentage value. The smaller the k-gram value, the greater the percentage value.