Premium
Natural Language Processing for the Extraction of Patient Symptoms during Cancer Radiotherapy
Author(s) -
Hong J.
Publication year - 2020
Publication title -
health services research
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.706
H-Index - 121
eISSN - 1475-6773
pISSN - 0017-9124
DOI - 10.1111/1475-6773.13387
Subject(s) - common terminology criteria for adverse events , medicine , snomed ct , medical physics , inter rater reliability , data extraction , medline , grading (engineering) , radiation therapy , natural language processing , artificial intelligence , terminology , computer science , psychology , rating scale , developmental psychology , linguistics , philosophy , civil engineering , political science , law , engineering
On‐treatment evaluation of patients undergoing radiation therapy (RT) and chemoradiation (CRT) is important for managing symptoms related to disease, RT, or systemic therapy. Automated extraction of clinical symptoms from free‐text documentation can enable the implementation of machine learning (ML) or artificial intelligence (AI) tools such as our previously developed pretreatment ML algorithm to predict ED visits and hospitalization during treatment. We present analysis of extracting on‐treatment symptom data from clinical notes via a natural language processing (NLP) pipeline. We obtained free‐text note data for 6,918 outpatient RT or CRT courses for adult patients (for any indication) at Duke from 2013 to 2016. The Apache clinical Text Analysis Knowledge Extraction System (cTAKES) default clinical pipeline was used to extract SNOMED terms identified as explicitly present, absent, or not mentioned. These were converted to NCI Common Terminology Criteria for Adverse Events (CTCAE) v5.0 terms via the Observational Health Data Sciences and Informatics (OHDSI) Athena vocabulary. CTCAE is the current standard for oncology toxicity encoding and grading. The performance was evaluated in 100 randomly selected notes in comparison to gold standard manual abstraction of CTCAE toxicities by two senior radiation oncology residents with adjudication by an attending radiation oncologist. Reviewers were instructed to identify all mentioned symptoms and were blinded to each other’s identifications. We created a thesaurus to harmonize overlapping CTCAE terms. Interrater reliability (IRR) was assessed based on unweighted and weighted Cohen’s kappa coefficients between reviewers and versus the consensus. Detected symptoms in notes with both positive and negative mentions were considered positive. Sensitivity and specificity were calculated on a per symptom basis. Clinical notes for patients undergoing cancer RT. One hundred notes representing diverse disease sites revealed disagreements between physician reviewers in symptom identification in 93 of 100 notes, with median 4 per note (range 1‐12). Unweighted kappa was 0.68 (95% CI 0.65‐0.71) and weighted kappa 0.59 (0.22‐1.00). Based on consensus symptom identification, NLP had strong detection performance for a number of symptoms with positive mentions in notes, including radiation dermatitis (80% sensitivity, 98% specificity), fatigue (74%; 100%), and nausea 85%; 99%). Detection of pain (63%; 64%) was more limited. In contrast, negated mentions had low rates of sensitivity across symptoms, such as radiation dermatitis (19%), pain (7%), and soft tissue fibrosis 0%. Interobserver identification of acute toxicities during cancer therapy is highly variable. Natural language processing can provide systematic identification of toxicity during therapy, particularly for positive mentions. Computational detection of negated symptoms is more challenging and represents an area for continued development. NLP can facilitate systematic automated characterization of adverse events during cancer therapy at scale. Inclusion of symptom information from clinical notes allows for better characterization and understanding of nuances in patient symptom trajectories and without any additional burden (eg, structured data capture or workflow adjustments) by the care team. This enables real‐time opportunities for improved surveillance, quality measurement, and supportive care in clinical practice with minimal burden.