
Research on Feature Extraction Technology of Japanese Corpus Resources Based on Rule Matching
Author(s) -
Lele Zhang
Publication year - 2021
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1982/1/012092
Subject(s) - computer science , matching (statistics) , relation (database) , relationship extraction , artificial intelligence , position (finance) , factor (programming language) , feature extraction , word (group theory) , speech recognition , feature (linguistics) , pattern recognition (psychology) , natural language processing , part of speech tagging , part of speech , data mining , mathematics , statistics , linguistics , philosophy , geometry , finance , economics , programming language
Existing methods of relation extraction can be divided into pattern matching-based methods, dictionary-driven methods and machine learning-based methods, among which machine learning-based methods are the mainstream methods of relation extraction at present. Based on the analysis of Snort’s new features and existing rule matching methods, this paper effectively combines Snort algorithm with Japanese language features, and corrects keyword deviation caused by word frequency dependence by introducing multiple eigenvalues such as position factor and part-of-speech factor. Experimental results show that the introduction of part-of-speech factor and position factor is simple and efficient, and can effectively improve the extraction effect of Japanese keywords, which is suitable for short article keywords.