z-logo
open-access-imgOpen Access
Identifying COVID-19 Outbreaks From Contact-Tracing Interview Forms for Public Health Departments: Development of a Natural Language Processing Pipeline
Author(s) -
John Caskey,
Iain McConnell,
Madeline Oguss,
Dmitriy Dligach,
Xie Rachel Kulikoff,
Brittany Grogan,
Crystal Gibson,
Elizabeth Wimmer,
Traci DeSalvo,
Edwin E Nyakoe-Nyasani,
Matthew M Churpek,
Majid Afshar
Publication year - 2022
Publication title -
jmir public health and surveillance
Language(s) - English
Resource type - Journals
ISSN - 2369-2960
DOI - 10.2196/36119
Subject(s) - outbreak , pipeline (software) , computer science , recall , precision and recall , named entity recognition , contact tracing , covid-19 , natural language processing , artificial intelligence , data mining , infectious disease (medical specialty) , medicine , disease , psychology , engineering , virology , pathology , systems engineering , cognitive psychology , programming language , task (project management)
Background In Wisconsin, COVID-19 case interview forms contain free-text fields that need to be mined to identify potential outbreaks for targeted policy making. We developed an automated pipeline to ingest the free text into a pretrained neural language model to identify businesses and facilities as outbreaks. Objective We aimed to examine the precision and recall of our natural language processing pipeline against existing outbreaks and potentially new clusters. Methods Data on cases of COVID-19 were extracted from the Wisconsin Electronic Disease Surveillance System (WEDSS) for Dane County between July 1, 2020, and June 30, 2021. Features from the case interview forms were fed into a Bidirectional Encoder Representations from Transformers (BERT) model that was fine-tuned for named entity recognition (NER). We also developed a novel location-mapping tool to provide addresses for relevant NER. Precision and recall were measured against manually verified outbreaks and valid addresses in WEDSS. Results There were 46,798 cases of COVID-19, with 4,183,273 total BERT tokens and 15,051 unique tokens. The recall and precision of the NER tool were 0.67 (95% CI 0.66-0.68) and 0.55 (95% CI 0.54-0.57), respectively. For the location-mapping tool, the recall and precision were 0.93 (95% CI 0.92-0.95) and 0.93 (95% CI 0.92-0.95), respectively. Across monthly intervals, the NER tool identified more potential clusters than were verified in WEDSS. Conclusions We developed a novel pipeline of tools that identified existing outbreaks and novel clusters with associated addresses. Our pipeline ingests data from a statewide database and may be deployed to assist local health departments for targeted interventions.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here