
New Method Based Pre-Processing to Tackle Missing and High Dimensional Data of CRISP-DM Approach
Author(s) -
Joko Suntoro,
Ahmad Ilham,
Handini Arga Damar Rani
Publication year - 2020
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1471/1/012012
Subject(s) - preprocessor , computer science , missing data , data pre processing , data mining , kidney disease , naive bayes classifier , artificial intelligence , machine learning , medicine , support vector machine
The kidneys are one of the most important organs including the excretion system in humans. The kidneys are responsible for maintaining blood concentrations to remain constant (homeostatic) and help to control blood pressure (BP). If the task of the kidney is not functioning properly it will cause kidney failure. In the past decade, data mining methods have been used to diagnose kidney failure. The dataset used to predict kidney failure was successfully summarized by Soundarapandian, and was named the Chronic Kidney Disease (CKD) dataset. But the data in the CKD dataset contains missing value and high dimension data (original data) so that it affects the evaluation results on classification. This research proposes methods in preprocessing data, namely modus in every class (MEC) method to solve missing value problems, and the weight information gain (WIG) method for solving high dimensional data problems, the proposed method is named the MEC + WIG method. The MEC + WIG method will be compared with the original method and the MEC method and evaluated based on the accuracy of the traditional classification method (k-NN, Naïve Bayes, C4.5, and CART). The results showed that the average accuracy of the MEC + WIG method was better than the original method and the MEC method, with the average accuracy of the MEC + WIG method at 98.13%, while the average value of the accuracy of the original method and MEC respectively amounting to 88.56% and 92.88%. There were significant differences between the three methods when tested using Friedman test with a p-value of 0.02. It can be concluded that the MEC + WIG method can improve the performance of traditional methods k-NN, Naive Bayes, C4.5 and CART by overcoming the problem of missing value and data high dimension.