Premium
Evaluation of Emergency Medical Text Processor, a System for Cleaning Chief Complaint Text Data
Author(s) -
Travers Debbie A.,
Haas Stephanie W.
Publication year - 2004
Publication title -
academic emergency medicine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.221
H-Index - 124
eISSN - 1553-2712
pISSN - 1069-6563
DOI - 10.1197/j.aem.2004.08.012
Subject(s) - unified medical language system , medicine , term (time) , complaint , cohen's kappa , natural language processing , kappa , raw score , raw data , emergency department , artificial intelligence , information retrieval , computer science , machine learning , nursing , programming language , linguistics , philosophy , physics , quantum mechanics , political science , law
Abstract Objectives: Emergency Medical Text Processor (EMT‐P) version 1, a natural language processing system that cleans emergency department text (e.g., chst pn, chest pai), was developed to maximize extraction of standard terms (e.g., chest pain). The authors compared the number of standard terms extracted from raw chief complaint (CC) data with that for CC data cleaned with EMT‐P and evaluated the accuracy of EMT‐P. Methods: This cross‐sectional observation study included CC text entries for all emergency department visits to three tertiary care centers in 2001. Terms were extracted from CC entries before and after cleaning with EMT‐P. Descriptive statistics included number and percentage of all entries (tokens) and all unique entries (types) that matched a standard term from the Unified Medical Language System (UMLS). An expert panel rated the accuracy of the CC–UMLS term matches; inter‐rater reliability was measured with κ. Results: The authors collected 203,509 CC entry tokens, of which 63,946 were unique entry types. For the raw data, 89,337 tokens (44%) and 5,081 types (8%) matched a standard term. After EMT‐P cleaning, 168,050 tokens (83%) and 44,430 types (69%) matched a standard term. The expert panel reached consensus on 201 of the 222 CC–UMLS term matches reviewed (κ= 0.69–0.72). Ninety‐six percent of the 201 matches were rated equivalent or related. Thirty‐eight percent of the nonmatches were found to match UMLS concepts. Conclusions: EMT‐P version 1 is relatively accurate, and cleaning with EMT‐P improved the CC–UMLS term match rate over raw data. The authors identified areas for improvement in future EMT‐P versions and issues to be resolved in developing a standard CC terminology.