Premium
Automated detection of sudden unexpected death in epilepsy risk factors in electronic medical records using natural language processing
Author(s) -
Barbour Kristen,
Hesdorffer Dale C.,
Tian Niu,
Yozawitz Elissa G.,
McGoldrick Patricia E.,
Wolf Steven,
McDonough Tiffani L.,
Nelson Aaron,
Loddenkemper Tobias,
Basma Natasha,
Johnson Stephen B.,
Grinspan Zachary M.
Publication year - 2019
Publication title -
epilepsia
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.687
H-Index - 191
eISSN - 1528-1167
pISSN - 0013-9580
DOI - 10.1111/epi.15966
Subject(s) - generalizability theory , epilepsy , medicine , medical record , boilerplate text , veterans affairs , emergency medicine , psychology , computer science , psychiatry , surgery , programming language , developmental psychology
Objective Sudden unexpected death in epilepsy ( SUDEP ) is an important cause of mortality in epilepsy. However, there is a gap in how often providers counsel patients about SUDEP . One potential solution is to electronically prompt clinicians to provide counseling via automated detection of risk factors in electronic medical records ( EMR s). We evaluated (1) the feasibility and generalizability of using regular expressions to identify risk factors in EMR s and (2) barriers to generalizability. Methods Data included physician notes for 3000 patients from one medical center (home) and 1000 from five additional centers (away). Through chart review, we identified three SUDEP risk factors: (1) generalized tonic–clonic seizures, (2) refractory epilepsy, and (3) epilepsy surgery candidacy. Regular expressions of risk factors were manually created with home training data, and performance was evaluated with home test and away test data. Performance was evaluated by sensitivity, positive predictive value, and F‐measure. Generalizability was defined as an absolute decrease in performance by <0.10 for away versus home test data. To evaluate underlying barriers to generalizability, we identified causes of errors seen more often in away data than home data. To demonstrate how small revisions can improve generalizability, we removed three “boilerplate” standard text phrases from away notes and repeated performance. Results We observed high performance in home test data (F‐measure range = 0.86‐0.90), and low to high performance in away test data (F‐measure range = 0.53‐0.81). After removing three boilerplate phrases, away performance improved (F‐measure range = 0.79‐0.89) and generalizability was achieved for nearly all measures. The only significant barrier to generalizability was use of boilerplate phrases, causing 104 of 171 errors (61%) in away data. Significance Regular expressions are a feasible and probably a generalizable method to identify variables related to SUDEP risk. Our methods may be implemented to create large patient cohorts for research and to generate electronic prompts for SUDEP counseling.