
Research on domain terminology recognition based on dependency tree-conditional random field
Author(s) -
Yanyan Lin,
Jing Lü
Publication year - 2019
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1213/5/052076
Subject(s) - dependency (uml) , conditional random field , terminology , computer science , artificial intelligence , identification (biology) , dependency grammar , field (mathematics) , named entity recognition , natural language processing , feature (linguistics) , pattern recognition (psychology) , decision tree , machine learning , data mining , mathematics , engineering , linguistics , philosophy , botany , systems engineering , pure mathematics , biology , task (project management)
In view of the inconsistency of Chinese patent information in manual marking and classification, which leads to problems such as missed detection, partial detection and noise of patent search, this paper proposes a method based on the dependency tree-conditional random field(CRF) identification field terminology. The method is based on the modern grammar theory of dependency, using the existing technology to mark the dependency relationship. Finally, the corresponding technical feature words are identified in the results of the dependency labelling, and the training data is used as the training data to train the conditional random field model to identify the domain terminology. The experimental results show that the acquisition of training data through the dependency tree can improve the accuracy, recall and F value of the recognition results.