Handling missing data in a rheumatoid arthritis registry using random forest approach | Zendy

Alsaber Ahmad | Zendy; AlHerz Adeeba | Zendy; Pan Jiazhu | Zendy; ALSultan Ahmad T. | Zendy; Mishra Divya | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Handling missing data in a rheumatoid arthritis registry using random forest approach

Author(s) -

Alsaber Ahmad,

AlHerz Adeeba,

Pan Jiazhu,

ALSultan Ahmad T.,

Mishra Divya

Publication year - 2021

Publication title -

international journal of rheumatic diseases

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.795

H-Index - 41

eISSN - 1756-185X

pISSN - 1756-1841

DOI - 10.1111/1756-185x.14203

Subject(s) - missing data , imputation (statistics) , categorical variable , statistics , mean squared error , medicine , random forest , data mining , mathematics , computer science , artificial intelligence

Missing data in clinical epidemiological research violate the intention‐to‐treat principle, reduce the power of statistical analysis, and can introduce bias if the cause of missing data is related to a patient's response to treatment. Multiple imputation provides a solution to predict the values of missing data. The main objective of this study is to estimate and impute missing values in patient records. The data from the Kuwait Registry for Rheumatic Diseases was used to deal with missing values among patient records. A number of methods were implemented to deal with missing data; however, choosing the best imputation method was judged by the lowest root mean square error (RMSE). Among 1735 rheumatoid arthritis patients, we found missing values vary from 5% to 65.5% of the total observations. The results show that sequential random forest method can estimate these missing values with a high level of accuracy. The RMSE varied between 2.5 and 5.0. missForest had the lowest imputation error for both continuous and categorical variables under each missing data rate (10%, 20%, and 30%) and had the smallest prediction error difference when the models used the imputed laboratory values.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore