
Analisis Pengaruh Teks Preprocessing Terhadap Deteksi Plagiarisme Pada Dokumen Tugas Akhir
Author(s) -
Ariel Elbert Budiman,
Andreas Widjaja
Publication year - 2020
Publication title -
jutisi (jurnal teknik informatika dan sistem informasi)
Language(s) - English
Resource type - Journals
ISSN - 2443-2229
DOI - 10.28932/jutisi.v6i3.2892
Subject(s) - cosine similarity , similarity (geometry) , preprocessor , string (physics) , computer science , plagiarism detection , data pre processing , value (mathematics) , information retrieval , artificial intelligence , pattern recognition (psychology) , data mining , mathematics , machine learning , image (mathematics) , mathematical physics
Final Project Report at a university has the potential for plagiarism. To detect possible plagiarism, String Similarity can be used. Text preprocessing is needed to process words which can make String Similarity results inaccurate. The value of the distribution of the results of the similarity that is getting higher shows the level of accuracy is also getting higher. Reports that contain many words can make it difficult to find plagiarism recommendations. In this study, we try to divide the report into each chapter to provide more detailed recommendation material. By using text preprocessing and comparison methods in the same chapter, can determine the characteristics of each chapter. The discovery of the characteristics of each chapter can be used as plagiarism recommendation material in more detail than a full text report. The experiment was a comparison of the results of cosine similarity between the same chapters and full text, then combined with preprocessing stopword removal and stemming. The experimental results show that the use of preprocessing stopword removal and stemming can produce the highest distribution value and the similarity ratio in each chapter can show its characteristics. Words that represent the characteristics of a chapter can potentially become a stopword.