Premium
A comprehensive search for expert classification methods in disease diagnosis and prediction
Author(s) -
Jha Sunil Kr.,
Pan Zhaoqing,
Elahi Ehsan,
Patel Nilesh
Publication year - 2019
Publication title -
expert systems
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.365
H-Index - 38
eISSN - 1468-0394
pISSN - 0266-4720
DOI - 10.1111/exsy.12343
Subject(s) - computer science , naive bayes classifier , decision tree , classifier (uml) , receiver operating characteristic , cohen's kappa , data mining , decision tree learning , artificial intelligence , bayes classifier , precision and recall , machine learning , pattern recognition (psychology) , support vector machine
Healthcare data analysis is currently a challenging and crucial research issue for the development of a robust disease diagnosis and prediction system. Many specific and a few common methods have been discussed in the literature for healthcare data classification. The present study implements 32 classification methods of six categories (Bayes, function‐based, lazy, meta, rule‐based, and tree‐based) with the objective of searching the best and common categories and methods in healthcare data mining. The performance of each classification method has been evaluated based on analysis time, classification accuracy, precision, recall, F‐measure, area under the receiver operating characteristic curve, root mean square error, kappa coefficient, Kulczynski's measure, and Fowlkes–Mallows index and compared with more than 90 classification methods used in past studies. Seventeen healthcare datasets related to thyroid, cancer, skin disease, heart disease, hepatitis, lymphography, audiology, diabetes, surgery, arrhythmia, postsurvival, liver, and tumour have been used in the performance assessment of the classification methods. The tree‐based classification methods have a better performance (with an average classification accuracy of 79.92% and maximum accuracy of 99.50%; an analysis time of 3.91 s for the logistic model tree classifier) than the other methods. Furthermore, the association of datasets and classification methods has been discussed.