Premium
Comparison of six multiclass classifiers by the use of different classification performance indicators
Author(s) -
Szöllősi Dániel,
Dénes Dénes Lajos,
Firtha Ferenc,
Kovács Zoltán,
Fekete András
Publication year - 2012
Publication title -
journal of chemometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.47
H-Index - 92
eISSN - 1099-128X
pISSN - 0886-9383
DOI - 10.1002/cem.2432
Subject(s) - ranking (information retrieval) , artificial intelligence , mathematics , random forest , statistics , linear discriminant analysis , kappa , pattern recognition (psychology) , cohen's kappa , receiver operating characteristic , value (mathematics) , computer science , machine learning , geometry
Classification problems are very important, and generally, the question is which is the best model. Several classification performance indicators including the classification accuracy value (ACC), Cohen's kappa (KAPPA), or the area under the ROC curve (AUC) are used to answer this question. There are non‐parametric comparative methods such as the sum of ranking differences method. The objective of this work was to find the best classification method to classify four soft drink samples and four model samples, which differ from each other only in the sweetener composition. Model samples were used to be basic samples for comparison with the commercial soft drinks. Six different classification methods were compared according to their classification performance. A corrected classification accuracy value (corrected ACC) was developed for the purpose and was introduced. This value takes into account the similarities between the classes. The results showed that the ACC value and the KAPPA values give similar results in our case. The best three models according to the ACC, KAPPA, and AUC were “K‐nearest neighbor,” “random forest,” and “discriminant analysis.” However, the corrected ACC value showed a bit different ranking, and the random forest model was neglected from the good models. The confusion matrices of the models confirmed the ranking according to the corrected ACC value. The results showed that the best classification model was the K‐nearest neighbor for the available samples, and the corrected ACC value is a useful classification performance indicator. Copyright © 2012 John Wiley & Sons, Ltd.