
Research on Named Entity Recognition Technology for Chinese Titles
Author(s) -
Kong Zhang,
Gang Qian
Publication year - 2021
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1732/1/012016
Subject(s) - computer science , character (mathematics) , artificial intelligence , word (group theory) , annotation , relevance (law) , natural language processing , process (computing) , feature (linguistics) , pattern recognition (psychology) , philosophy , linguistics , geometry , mathematics , political science , law , operating system
Aiming at the problem that the feature extraction of Chinese title data is insufficient in the process of named entity recognition and the marking data is difficult to obtain, this paper presents a CNN-BiLSTM-CRF model. The model trains large-scale corpus to generate word vectors with semantic information and trains the character vectors in words through CNN to generate character vectors containing character features, the word vector and the character vectors are merged as the input of BiLSTM network. Through the BiLSTM network, the front and back texts features are fully extracted, and CRF is used to restrict the relevance of tags. In addition, this paper proposes an algorithm based on active learning. This algorithm uses the proposed model construction committee to select the samples with large amount of information to mark, which effectively reduces the amount of data annotation.