Comparison of Machine Learning With Logistic Regression for Prediction of Chronic Kidney Disease in the Thai Adult Population | Zendy

Ratchainant Thammasudjarit | Zendy; Punnathorn Ingsathit | Zendy; Sigit Ari Saputro | Zendy; Atiporn Ingsathit | Zendy; Ammarin Thakkinstian | Zendy

Open Access

Comparison of Machine Learning With Logistic Regression for Prediction of Chronic Kidney Disease in the Thai Adult Population

Author(s) -

Ratchainant Thammasudjarit,

Punnathorn Ingsathit,

Sigit Ari Saputro,

Atiporn Ingsathit,

Ammarin Thakkinstian

Publication year - 2021

Publication title -

ramathibodi medical journal

Language(s) - English

Resource type - Journals

ISSN - 2651-0561

DOI - 10.33165/rmj.2021.44.4.250334

Subject(s) - logistic regression , overfitting , statistics , random forest , decision tree , kidney disease , naive bayes classifier , population , predictive modelling , machine learning , artificial intelligence , artificial neural network , computer science , medicine , mathematics , support vector machine , environmental health

Background: Chronic kidney disease (CKD) takes huge amounts of resources for treatments. Early detection of patients by risk prediction model should be useful in identifying risk patients and providing early treatments. Objective: To compare the performance of traditional logistic regression with machine learning (ML) in predicting the risk of CKD in Thai population. Methods: This study used Thai Screening and Early Evaluation of Kidney Disease (SEEK) data. Seventeen features were firstly considered in constructing prediction models using logistic regression and 4 MLs (Random Forest, Naïve Bayes, Decision Tree, and Neural Network). Data were split into train and test data with a ratio of 70:30. Performances of the model were assessed by estimating recall, C statistics, accuracy, F1, and precision. Results: Seven out of 17 features were included in the prediction models. A logistic regression model could well discriminate CKD from non-CKD patients with the C statistics of 0.79 and 0.78 in the train and test data. The Neural Network performed best among ML followed by a Random Forest, Naïve Bayes, and a Decision Tree with the corresponding C statistics of 0.82, 0.80, 0.78, and 0.77 in training data set. Performance of these corresponding models in testing data decreased about 5%, 3%, 1%, and 2% relative to the logistic model by 2%. Conclusions: Risk prediction model of CKD constructed by the logit equation may yield better discrimination and lower tendency to get overfitting relative to ML models including the Neural Network and Random Forest.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore