z-logo
open-access-imgOpen Access
Performance Analysis of Big data Classification Techniques on Diabetes Prediction
Author(s) -
P. Pandeeswary,
Dr.M. Janaki
Publication year - 2019
Publication title -
international journal of innovative technology and exploring engineering
Language(s) - English
Resource type - Journals
ISSN - 2278-3075
DOI - 10.35940/ijitee.j8840.0881019
Subject(s) - computer science , big data , machine learning , confusion matrix , artificial intelligence , data mining , naive bayes classifier , support vector machine , statistical classification , sort , information retrieval
Big data is extremely huge data sets analyzed computationally to expose patterns, trends, and prediction in order to make simpler the decision making. Predicting diseases became very important, it can be obtained with a large dataset using classification techniques. Various big data analytics tools are available for classification. Classification is the general technique used in the medical analysis for data prediction. In these paper classification algorithms like Support vector machine, Naïve Bayesian and C4.5 are discussed. The Pima Indian Diabetes Database (PIDD) is used in the analysis of the Classification algorithms to sort out and classify the people with diabetes positive and with diabetes negative it is openly accessible machine learning database found at UCI. The objective is to find the best suitable technique for prediction. Here, we used the comparison method with the results of three supervised learning algorithms based on three criteria, computation time taken, accuracy rate and error rate using the Tanagra tool. The classification algorithms are used to predict diabetes based on the data given. Accordingly, many classification techniques are there, from this study a few classification techniques suggested to be used in big data analysis, which has the probability to significantly progress the prediction. . As a result, a representative confusion matrix is displayed for making the verification process faster. From the results, it is concluded that C4.5 algorithm is best suited for predicting diabetes disease and also can be used in other disciplinary for making better prediction.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here