TCMNER and PubMed: A Novel Chinese Character-Level-Based Model and a Dataset for TCM Named Entity Recognition
Author(s) -
Zhi Liu,
Changyong Luo,
Zeyu Zheng,
Yan Li,
Dianzheng Fu,
Xinzhu Yu,
Jiawei Zhao
Publication year - 2021
Publication title -
journal of healthcare engineering
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.509
H-Index - 29
eISSN - 2040-2309
pISSN - 2040-2295
DOI - 10.1155/2021/3544281
Subject(s) - computer science , artificial intelligence , field (mathematics) , construct (python library) , named entity recognition , character (mathematics) , natural language processing , representation (politics) , traditional chinese medicine , deep learning , information retrieval , machine learning , medicine , alternative medicine , geometry , mathematics , management , pathology , politics , political science , pure mathematics , law , economics , programming language , task (project management)
Intelligent traditional Chinese medicine (TCM) has become a popular research field by means of prospering of deep learning technology. Important achievements have been made in such representative tasks as automatic diagnosis of TCM syndromes and diseases and generation of TCM herbal prescriptions. However, one unavoidable issue that still hinders its progress is the lack of labeled samples, i.e., the TCM medical records. As an efficient tool, the named entity recognition (NER) models trained on various TCM resources can effectively alleviate this problem and continuously increase the labeled TCM samples. In this work, on the basis of in-depth analysis, we argue that the performance of the TCM named entity recognition model can be better by using the character-level representation and tagging and propose a novel word-character integrated self-attention module. With the help of TCM doctors and experts, we define 5 classes of TCM named entities and construct a comprehensive NER dataset containing the standard content of the publications and the clinical medical records. The experimental results on this dataset demonstrate the effectiveness of the proposed module.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom