z-logo
open-access-imgOpen Access
A Contrastive Study of Chinese Text Segmentation Tools in Marketing Notification Texts
Author(s) -
Xianwei Zhang,
Peng Wu,
Jiuming Cai,
Kun Wang
Publication year - 2019
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1302/2/022010
Subject(s) - segmentation , computer science , text segmentation , natural language processing , artificial intelligence , word (group theory) , market segmentation , base (topology) , precision and recall , word lists by frequency , pattern recognition (psychology) , information retrieval , linguistics , marketing , mathematics , mathematical analysis , philosophy , sentence , business
It is necessary to analyze and mining marketing notification texts because there are various commercial information. The base of the operation is Chinese word segmentation. The speed and accuracy of word segmentation have important influence on the subsequent texts mining. We compared accuracy, recall, and F-value of four open-source Chinese word segmentation tools (Ansj, HanLP, Word and Jieba) on the third-party datasets. Then, we compared the segmentation speed of the four tools on one million marketing notification texts. Finally, we segmented 5, 000 marketing notification texts artificially. We evaluated the performance of these segmentation tools by the results of artificial segmentation, which are known as evaluate standard. The experiments show the Base mode of the Ansj is the fastest. The HanLP is a best segmentation tool for balancing speed and accuracy of word segmentation. After adding a custom dictionary, the effect of word segmentation has been significantly improved.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here