Capsules Based Chinese Word Segmentation for Ancient Chinese Medical Books
Author(s) -
Si Li,
Mingzheng Li,
Yajing Xu,
Zuyi Bao,
Lu Fu,
Yan Zhu
Publication year - 2018
Publication title -
ieee access
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.587
H-Index - 127
ISSN - 2169-3536
DOI - 10.1109/access.2018.2881280
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Neural network models are popularly used in Chinese word segmentation task. The capsule architecture is proposed recently which has solved some defects of convolutional neural network. In this paper, we first introduce the capsule architecture to Chinese word segmentation. We utilize capsules as neural units. Before doing routing algorithm, we make a sliding capsule window to select the features which are extracted from the primary capsule layer. The sliding capsule window is proposed to adapt the capsule architecture to the sequence labeling task. The experiment results show that our proposed capsules based Chinese word segmentation model achieves competitive performances with the previous state-of-the-art methods. Ancient Chinese medical books record a lot of valuable experiences from the ancient medical workers. However, the research about the automatic text analysis on ancient Chinese medical documents is just a beginning. Due to the lack of the annotated data for Chinese medicine, we develop the word segmentation guideline for the ancient Chinese medical documents and select 10 genres, 30 ancient Chinese medical books to set up the annotation dataset. And with the annotated data, we develop the segmenter for the ancient Chinese medical text. Experiments show that the F1 measures of our model on the two datasets are 94.9% and 81.4% on Chinese Treebank6.0 and Ancient Chinese Medical Books, respectively.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom