
Open Problems in Indonesian Automatic Essay Scoring System
Author(s) -
Faisal Rahutomo,
Trisna Ari Roshinta,
Erfan Rohadi,
Indrazno Siradjuddin,
Rudy Ariyanto,
Awan Setiawan,
Supriatna Adhisuwignjo
Publication year - 2018
Publication title -
international journal of engineering and technology
Language(s) - English
Resource type - Journals
ISSN - 2227-524X
DOI - 10.14419/ijet.v7i4.44.26974
Subject(s) - jaccard index , euclidean distance , cosine similarity , similarity (geometry) , computer science , metric (unit) , artificial intelligence , data mining , mathematics , statistics , pattern recognition (psychology) , engineering , operations management , image (mathematics)
This paper presents open problems in Indonesian Scoring System. The previous study exposes the comparison of several similarity metrics on automated essay scoring in Indonesian. The metrics are Cosine Similarity, Euclidean Distance, and Jaccard. The data being used in the research are about 2,000 texts. This data are obtained from 50 students who answered 40 questions on politics, sports, lifestyle, and technology. The study also evaluates the stemming approach for the system performance. The difference between all methods between using stemming or not is around 4-9%. The results show Jaccard is the best metric both for the system with stemming or not. Jaccard method with stemming has the percentage error lowest than the others. The politic category has the highest average similarity score than lifestyle, sport, and technology. The percentage error of Jaccard with stemming is 52.31%, Cosine Similarity is 59.49%, and Euclidean Distance is 332.90%. In addition, Jaccard without stemming is also the best than the others. The percentage error without stemming of Jaccard is 56.05%, Cosine Similarity is 57.99%, and Euclidean Distance is 339.41%. However, this percentage error is high enough to be used for a functional essay grading system. The percentage errors are relatively high, more than 50%. Therefore this paper explores several ideas of open problems in this issue. The openly available dataset can be used to develop better approaches than the standard similarity metrics. The approaches expose are ranging from feature extraction, similarity metrics, learning algorithm, environment implementation, and performance evaluation.