z-logo
open-access-imgOpen Access
Natural Language Processing of Serum Protein Electrophoresis Reports in the Veterans Affairs Health Care System
Author(s) -
Justine Ryu,
Andrew J. Zimolzak
Publication year - 2020
Publication title -
jco clinical cancer informatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.188
H-Index - 12
ISSN - 2473-4276
DOI - 10.1200/cci.19.00167
Subject(s) - monoclonal gammopathy , veterans affairs , artificial intelligence , receiver operating characteristic , computer science , university hospital , medicine , machine learning , natural language processing , monoclonal , monoclonal antibody , family medicine , immunology , antibody
PURPOSE Serum protein electrophoresis (SPEP) is a clinical tool used to screen for monoclonal gammopathy, thus it is a critical tool in the evaluation of patients with multiple myeloma. However, SPEP laboratory results are usually returned as short text reports, which are not amenable to simple computerized processing for large-scale studies. We applied natural language processing (NLP) to detect monoclonal gammopathy in SPEP laboratory results and compared its performance at multiple hospitals using both a rules-based manual system and a machine-learning algorithm.METHODS We used the data from the VA Corporate Data Warehouse, which comprises data from 20 million unique individuals. SPEP reports were collected from July to December 2015 at 5 Veterans Affairs Medical Centers. Of these reports, we annotated the presence or absence of monoclonal gammopathy in 300 reports. We applied a machine learning–based NLP and a manual rules-based NLP to detect monoclonal gammopathy in SPEP reports at each of the hospitals, then applied the model from 1 hospital to each of the other hospitals.RESULTS The learning system achieved an area under the receiver operating characteristic curve of 0.997, and the rules-based system achieved an accuracy of 0.99. When a model trained on 1 hospital’s data was applied to a different hospital, however, accuracy varied greatly, and the learning-based models performed better than the rules-based model.CONCLUSION Binary classification of short clinical texts such as SPEP reports may be a particularly attractive target on which to train highly accurate NLP systems.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom