A New Unsupervised Approach to Word Segmentation | Zendy

Hanshi Wang | Zendy; Jian Zhu | Zendy; Shiping Tang | Zendy; Xiaozhong Fan | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

A New Unsupervised Approach to Word Segmentation

Author(s) -

Hanshi Wang,

Jian Zhu,

Shiping Tang,

Xiaozhong Fan

Publication year - 2011

Publication title -

computational linguistics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.314

H-Index - 98

eISSN - 1530-9312

pISSN - 0891-2017

DOI - 10.1162/coli_a_00058

Subject(s) - computer science , segmentation , word (group theory) , character (mathematics) , artificial intelligence , process (computing) , selection (genetic algorithm) , set (abstract data type) , data mining , pattern recognition (psychology) , mathematics , geometry , programming language , operating system

This article proposes ESA, a new unsupervised approach to word segmentation. ESA is an iterative process consisting of 3 phases: Evaluation, Selection, and Adjustment. In Evaluation, both certainty and uncertainty of character sequence co-occurrence in corpora are considered as the statistical evidence supporting goodness measurement. Additionally, the statistical data of character sequences with various lengths become comparable with each other by using a simple process called Balancing. In Selection, a local maximum strategy is adopted without thresholds, and the strategy can be implemented with dynamic programming. In Adjustment, a part of the statistical data is updated to improve successive results. In our experiment, ESA was evaluated on the SIGHAN Bakeoff-2 data set. The results suggest that ESA is effective on Chinese corpora. It is noteworthy that the F-measures of the results are basically monotone increasing and can rapidly converge to relatively high values. Furthermore, the empirical formulae based on the results can be used to predict the parameter in ESA to avoid parameter estimation that is usually time-consuming.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research