z-logo
open-access-imgOpen Access
Identifikasi Kemiripan Teks Menggunakan Class Indexing Based dan Cosine Similarity Untuk Klasifikasi Dokumen Pengaduan
Author(s) -
Syahroni Wahyu Iriananda,
Muhammad Aziz Muslim,
Harry Soekotjo Dachlan
Publication year - 2019
Publication title -
matics
Language(s) - English
Resource type - Journals
eISSN - 2477-2550
pISSN - 1978-161X
DOI - 10.18860/mat.v10i2.5327
Subject(s) - cosine similarity , similarity (geometry) , weighting , computer science , preprocessor , search engine indexing , data pre processing , class (philosophy) , feature (linguistics) , pattern recognition (psychology) , artificial intelligence , data mining , physics , linguistics , philosophy , acoustics , image (mathematics)
Report handling on "LAPOR!" systemdepends on the system administrator who manually reads every incoming report [3]. Read manually can lead to errors in handling complaints [4] if the data flow is very large and grows rapidly it can take at least three days and sensitive to inconsistencies [3]. In this study, the authors propose a model that can measure and identify the similarity of document reports computerized that can identify the similarity between the Query (Incoming) with Document (Archive). In this study, the authors employed term weighting scheme Class-Based Indexing, and Cosine Similarity to analyze document similarities. CoSimTFIDF, CoSimTFICF and CoSimTFIDFICF values are defined as feature sets for the text classification process using the KNearest Neighbor (K-NN) method. The optimum result evaluation with preprocessing employ Stemming and the best result of all features is 75% training data ratio and 25% test data on the CoSimTFIDF feature that is 84%. Value k = 5 has a high accuracy of 84.12%

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here