
An Algorithm Based on Simple CNN and BI_LSTM Network for Chinese Word Segmentation
Author(s) -
Xiaohan Guan,
Xin Liu,
Zhi Li
Publication year - 2020
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1621/1/012001
Subject(s) - computer science , sentence , feature (linguistics) , word (group theory) , artificial intelligence , simple (philosophy) , segmentation , text segmentation , pattern recognition (psychology) , natural language processing , algorithm , mathematics , philosophy , linguistics , geometry , epistemology
In dealing with most Chinese NLP tasks, word segmentation is an indispensable and critical work, and it also affects the accuracy of subsequent tasks. Neither the traditional BI_LSTM network can extract features effectively, nor the CNN network can deal with the timing problem effectively. As a result, this paper proposes a new algorithm to solve these problems. First, the Chinese characters’ feature in a whole sentence is extracted and recombined using the CNN network; then, the recombined feature is combined in BI_LSTM network; finally, each word of the whole sentence which classified by weight sharing and the corresponding classification of each word is the consequent output. Not only the advantages of CNN network are utilized for feature extraction, but also the advantages of BI_LSTM network are retained for timing processing. In conclusion, the learning ability of BI_LSTM network is increased, and the accuracy of output is improved to 98%.