z-logo
open-access-imgOpen Access
Machine Learning for Prediction of Patients on Hemodialysis with an Undetected SARS-CoV-2 Infection
Author(s) -
Caitlin K. Monaghan,
John Larkin,
Sheetal Chaudhuri,
Hao Han,
Yanmei Jiao,
Kristine Marie Bermudez,
Eric D. Weinhandl,
Ines A. Dahne-Steuber,
Kathleen Belmonte,
Luca Neri,
Peter Kotanko,
Jeroen P. Kooman,
Jeffrey Hymes,
Robert J. Kossmann,
Len Usvyat,
Franklin W. Maddux
Publication year - 2021
Publication title -
kidney360
Language(s) - English
Resource type - Journals
ISSN - 2641-7650
DOI - 10.34067/kid.0003802020
Subject(s) - medicine , cohort , receiver operating characteristic , false positive paradox , population , covid-19 , cutoff , area under the curve , hemodialysis , statistics , artificial intelligence , emergency medicine , computer science , mathematics , disease , physics , environmental health , quantum mechanics , infectious disease (medical specialty)
Background We developed a machine learning (ML) model that predicts the risk of a patient on hemodialysis (HD) having an undetected SARS-CoV-2 infection that is identified after the following ≥3 days. Methods As part of a healthcare operations effort, we used patient data from a national network of dialysis clinics (February–September 2020) to develop an ML model (XGBoost) that uses 81 variables to predict the likelihood of an adult patient on HD having an undetected SARS-CoV-2 infection that is identified in the subsequent ≥3 days. We used a 60%:20%:20% randomized split of COVID-19–positive samples for the training, validation, and testing datasets. Results We used a select cohort of 40,490 patients on HD to build the ML model (11,166 patients who were COVID-19 positive and 29,324 patients who were unaffected controls). The prevalence of COVID-19 in the cohort (28% COVID-19 positive) was by design higher than the HD population. The prevalence of COVID-19 was set to 10% in the testing dataset to estimate the prevalence observed in the national HD population. The threshold for classifying observations as positive or negative was set at 0.80 to minimize false positives. Precision for the model was 0.52, the recall was 0.07, and the lift was 5.3 in the testing dataset. Area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC) for the model was 0.68 and 0.24 in the testing dataset, respectively. Top predictors of a patient on HD having a SARS-CoV-2 infection were the change in interdialytic weight gain from the previous month, mean pre-HD body temperature in the prior week, and the change in post-HD heart rate from the previous month. Conclusions The developed ML model appears suitable for predicting patients on HD at risk of having COVID-19 at least 3 days before there would be a clinical suspicion of the disease.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here