z-logo
open-access-imgOpen Access
A Chinese Message Sensitive Words Filtering System based on DFA and Word2vec
Author(s) -
Fei Wu,
Yuxiang Cai
Publication year - 2018
Publication title -
procedia computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.334
H-Index - 76
ISSN - 1877-0509
DOI - 10.1016/j.procs.2018.10.271
Subject(s) - computer science , word2vec , artificial intelligence , word (group theory) , natural language processing , sentence , mutual information , segmentation , basis (linear algebra) , construct (python library) , similarity (geometry) , stop words , pattern recognition (psychology) , thesaurus , point (geometry) , speech recognition , image (mathematics) , embedding , philosophy , linguistics , geometry , mathematics , preprocessor , programming language
In this paper, a Chinese message sensitive words filtering system applied in an instant messaging environment is proposed. Firstly, the message sentence is segmented, and the segmentation result is corrected by using the association algorithm based on information entropy and point mutual information. The traditional DFA algorithm is used to construct the dictionary tree for sensitive word recognition, which effectively improves the recognition speed. Secondly, on the basis of the completion of the recognition, the pre-trained word vector model is used to match the words in the sensitive words list and the word segmentation results, and the words with higher similarity with the sensitive words are added to the sensitive words list to achieve the expansion and improvement of the sensitive thesaurus.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom