Identification of social determinants of health using multi-label classification of electronic health record clinical notes | Zendy

Rachel Stemerman | Zendy; Jaime Arguello | Zendy; Jane H. Brice | Zendy; Ashok Krishnamurthy | Zendy; Mary I. Houston | Zendy; Rebecca Rutherford Kitzmiller | Zendy

Open Access

Identification of social determinants of health using multi-label classification of electronic health record clinical notes

Author(s) -

Rachel Stemerman,

Jaime Arguello,

Jane H. Brice,

Ashok Krishnamurthy,

Mary I. Houston,

Rebecca Rutherford Kitzmiller

Publication year - 2021

Publication title -

jamia open

Language(s) - English

Resource type - Journals

ISSN - 2574-2531

DOI - 10.1093/jamiaopen/ooaa069

Subject(s) - receiver operating characteristic , random forest , artificial intelligence , machine learning , support vector machine , electronic health record , classifier (uml) , computer science , f1 score , documentation , recall , metric (unit) , medicine , psychology , health care , economics , programming language , economic growth , operations management , cognitive psychology

Objectives Social determinants of health (SDH), key contributors to health, are rarely systematically measured and collected in the electronic health record (EHR). We investigate how to leverage clinical notes using novel applications of multi-label learning (MLL) to classify SDH in mental health and substance use disorder patients who frequent the emergency department. Methods and Materials We labeled a gold-standard corpus of EHR clinical note sentences (N = 4063) with 6 identified SDH-related domains recommended by the Institute of Medicine for inclusion in the EHR. We then trained 5 classification models: linear-Support Vector Machine, K-Nearest Neighbors, Random Forest, XGBoost, and bidirectional Long Short-Term Memory (BI-LSTM). We adopted 5 common evaluation measures: accuracy, average precision–recall (AP), area under the curve receiver operating characteristic (AUC-ROC), Hamming loss, and log loss to compare the performance of different methods for MLL classification using the F1 score as the primary evaluation metric. Results Our results suggested that, overall, BI-LSTM outperformed the other classification models in terms of AUC-ROC (93.9), AP (0.76), and Hamming loss (0.12). The AUC-ROC values of MLL models of SDH related domains varied between (0.59–1.0). We found that 44.6% of our study population (N = 1119) had at least one positive documentation of SDH. Discussion and Conclusion The proposed approach of training an MLL model on an SDH rich data source can produce a high performing classifier using only unstructured clinical notes. We also provide evidence that model performance is associated with lexical diversity by health professionals and the auto-generation of clinical note sentences to document SDH.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research