Premium
A Multicriteria Weighted Vote‐Based Classifier Ensemble for Heart Disease Prediction
Author(s) -
Bashir Saba,
Qamar Usman,
Khan Farhan Hassan
Publication year - 2016
Publication title -
computational intelligence
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.353
H-Index - 52
eISSN - 1467-8640
pISSN - 0824-7935
DOI - 10.1111/coin.12070
Subject(s) - naive bayes classifier , computer science , decision tree , classifier (uml) , ensemble learning , artificial intelligence , ensemble forecasting , support vector machine , confusion matrix , data mining , machine learning , pattern recognition (psychology) , random subspace method , bayes classifier
The availability of a large amount of medical data leads to the need of intelligent disease prediction and analysis tools to extract hidden information. A large number of data mining and statistical analysis tools are used for disease prediction. Single data‐mining techniques show acceptable level of accuracy for heart disease diagnosis. This article focuses on prediction and analysis of heart disease using weighted vote‐based classifier ensemble technique. The proposed ensemble model overcomes the limitations of conventional data‐mining techniques by employing the ensemble of five heterogeneous classifiers: naive Bayes, decision tree based on Gini index, decision tree based on information gain, instance‐based learner, and support vector machines. We have used five benchmark heart disease data sets taken from UCI repository. Each data set contains different set of feature space that ultimately leads to the prediction of heart disease. The effectiveness of proposed ensemble classifier is investigated by comparing the performance with different researchers' techniques. Tenfold cross‐validation is used to handle the class imbalance problem. Moreover, confusion matrices and analysis of variance statistics are used to show the prediction results of all classifiers. The experimental results verify that the proposed ensemble classifier can deal with all types of attributes and it has achieved the high diagnosis accuracy of 87.37%, sensitivity of 93.75%, specificity of 92.86%, and F ‐measure of 82.17%. The F ‐ratio higher than the F ‐critical and p ‐value less than 0.01 for a 95% confidence interval indicate that the results are statistically significant for all the data sets.