Premium
Dual‐stream algorithms for dementia detection: Harnessing structured and unstructured electronic health record data, a novel approach to prevalence estimation
Author(s) -
Collyer Taya A.,
Liu Ming,
Beare Richard,
Andrew Nadine E.,
Ung David,
Carver Alison,
Ilomaki Jenni,
Bell J. Simon,
Thrift Amanda G.,
Rocca Walter A.,
St Sauver Jennifer L.,
Lu Alicia,
Siostrom Kristy,
Moran Chris,
Roberts Helene,
Chong Trevor T.J.,
Murray Anne,
Ravipati Tanya,
O'Bree Bridget,
Srikanth Velandai K.
Publication year - 2025
Publication title -
alzheimer's and dementia
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 6.713
H-Index - 118
eISSN - 1552-5279
pISSN - 1552-5260
DOI - 10.1002/alz.70132
Abstract INTRODUCTION Identifying individuals with dementia is crucial for prevalence estimation and service planning, but reliable, scalable methods are lacking. We developed novel set algorithms using both structured and unstructured electronic health record (EHR) data, applying Diagnostic and Statistical Manual of Mental Disorders criteria for dementia case identification. METHODS Our cohort ( n = 1082) included individuals aged ≥ 60 with dementia identified through specialist clinics and a comparison group without dementia. Clinicians from Australia and the United States informed predictor selection. We developed algorithms through a biostatistics stream for structured data and a natural language processing (NLP) stream for text, synthesizing results via logistic regression. RESULTS The final structured model retained 16 variables (area under the receiver operating characteristic curve [AUC] 0.853, specificity 72.2%, sensitivity 80.6%). NLP classifiers (logistic regression, support vector machine, and random forest models) performed comparably. The final, combined model outperformed all others (AUC = 0.951, P < 0.001 for comparison to structured model). DISCUSSION Embedding text‐derived insights within algorithms trained on structured medical data significantly enhances dementia identification capacity. Highlights Algorithmic tools for detection of individuals with dementia are available; however, previous work has used heterogeneous case definitions which are not clinically meaningful, and has relied on proxies such as diagnostic codes or medications for case ascertainment. We used a novel, dual‐stream algorithmic development approach, simultaneously and separately modeling a clinically meaningful outcome (diagnosis of dementia according to specialized clinical impression) using structured and unstructured electronic health record datasets. Our clinically grounded case definition supported the inclusion of key structured variables (such as dementia International Classification of Disease codes and medications) as modeling predictors rather than outcomes. Our algorithms, published in detail to support validation and replication, represent a major step forward in the use of routinely collected data for detection of diagnosed dementia.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom