2107. Decision Trees vs. Neural Networks for Supervised Machine Learning-Based Prediction of Healthcare-Associated Urinary Tract Infections | Zendy

Philip Zachariah | Zendy; Elioth Mirsha Sanabria Buenaventura | Zendy; Jianfang Liu | Zendy; Bevin Cohen | Zendy; David D. Yao | Zendy; Elaine Larson | Zendy

Open Access

2107. Decision Trees vs. Neural Networks for Supervised Machine Learning-Based Prediction of Healthcare-Associated Urinary Tract Infections

Author(s) -

Philip Zachariah,

Elioth Mirsha Sanabria Buenaventura,

Jianfang Liu,

Bevin Cohen,

David D. Yao,

Elaine Larson

Publication year - 2018

Publication title -

open forum infectious diseases

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 1.546

H-Index - 35

ISSN - 2328-8957

DOI - 10.1093/ofid/ofy210.1763

Subject(s) - medicine , machine learning , test set , decision tree , artificial neural network , artificial intelligence , sigmoid function , cohort , emergency medicine , computer science

Background Supervised machine learning (SML)-based methods could facilitate early prediction of healthcare-related adverse events. The role of SML in stratifying patient-risk of infectious events during hospitalization and their performance using limited subsets of standardized and widely available predictors is less known. Using a large cohort of adult inpatients, we use SML techniques to predict a diagnosis of urinary tract infection (UTI) during hospitalization. Methods We used previously validated data from adults (≥18 years old) hospitalized between 2009 and 2016 in a healthcare system as part of a federally funded study. The outcome was a UTI detected >2 days after admission. Predictors measured clinical complexity, history of healthcare-associated complications and specific risk factors for UTI. Predictors were restricted to those standardized and readily obtainable across facilities (e.g., ICD codes). Two SML methods, neural networks (NN) and decision trees (DT) were used. The NN used two hidden layers and a sigmoid output function. The DT used binary recursive portioning and Gini coefficient to measure node impurity. 60% of available hospitalizations were the training set, and 40% used as test set for validation. Cross validation was used to refine the model. Oversampling was used to adjust for the rare outcome. The area under the curve (AUC) for the test set measured model performance. Results From a total of 897,344 hospitalizations there were 16,069 UTIs identified from the data set during the study period. Applying NN and DT to the raw dataset, AUC’s of 0.55 and 0.69 were achieved respectively with the test set. Model performance for DNN and DT improved with oversampling to 0.77 and 0.78, outperforming traditional logistic regression (Figure 1). The optimal DT is presented (Figure 2). Conclusion Reasonable prediction performance for an infectious event during hospitalization was achieved using a limited set of routinely available and standardized variables. While both SML methods had comparable performance, the DT was more interpretable. Further work will extend these methods to other infectious events, use more specific EHR data and link these predictions to interventions in real time. Disclosures All authors: No reported disclosures.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research