z-logo
open-access-imgOpen Access
Automated Electronic Health Record–Based Tool for Identification of Patients With Metastatic Disease to Facilitate Clinical Trial Patient Ascertainment
Author(s) -
Jeffrey J. Kirshner,
Kelly Cohn,
Steven Dunder,
Karri Donahue,
Madeline Richey,
Peter Larson,
Lauren Sutton,
Evelyn Siu,
J. Oliver Donegan,
Zexi Chen,
Caroline Nightingale,
Melissa Estévez,
James Hamrick
Publication year - 2021
Publication title -
jco clinical cancer informatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.188
H-Index - 12
ISSN - 2473-4276
DOI - 10.1200/cci.20.00180
Subject(s) - medicine , logistic regression , workflow , identification (biology) , electronic health record , artificial intelligence , set (abstract data type) , machine learning , oncology , health care , computer science , database , botany , economics , biology , programming language , economic growth
PURPOSE To facilitate identification of clinical trial participation candidates, we developed a machine learning tool that automates the determination of a patient's metastatic status, on the basis of unstructured electronic health record (EHR) data.METHODS This tool scans EHR documents, extracting text snippet features surrounding key words (such as metastatic, progression, and local). A regularized logistic regression model was trained and used to classify patients across five metastatic categories: highly likely and likely positive, highly likely and likely negative, and unknown. Using a real-world oncology database of patients with solid tumors with manually abstracted information as reference, we calculated sensitivity, specificity, negative predictive value (NPV), and positive predictive value (PPV). We validated the performance in a real-world data set, evaluating accuracy gains upon additional user review of tool's outputs after integration into clinic workflows.RESULTS In the training data set (N = 66,532), the model sensitivity and specificity (% [95% CI]) were 82.4 [81.9 to 83.0] and 95.5 [95.3 to 96.7], respectively; the PPV was 89.3 [88.8 to 90.0], and the NPV was 94.0 [93.8 to 94.2]. In the validation sample (n = 200 from five distinct care sites), after user review of model outputs, values increased to 97.1 [85.1 to 99.9] for sensitivity, 98.2 [94.8 to 99.6] for specificity, 91.9 [78.1 to 98.3] for PPV, and 99.4 [96.6 to 100.0] for NPV. The model assigned 163 of 200 patients to the highly likely categories. The error prevalence was 4% before and 2% after user review.CONCLUSION This tool infers metastatic status from unstructured EHR data with high accuracy and high confidence in more than 75% of cases, without requiring additional manual review. By enabling efficient characterization of metastatic status, this tool could mitigate a key barrier for patient ascertainment and clinical trial participation in community clinics.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom