Premium
Words without Boundaries: Computational Approaches to Chinese Word Segmentation
Author(s) -
Huang ChuRen,
Xue Nianwen
Publication year - 2012
Publication title -
language and linguistics compass
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.619
H-Index - 44
ISSN - 1749-818X
DOI - 10.1002/lnc3.357
Subject(s) - computer science , natural language processing , segmentation , orthography , text segmentation , artificial intelligence , character (mathematics) , word (group theory) , robustness (evolution) , computational linguistics , linguistics , domain (mathematical analysis) , mathematics , mathematical analysis , philosophy , biochemistry , geometry , reading (process) , chemistry , gene
The fact that words are not conventionally demarcated in Chinese orthography makes the process of word segmentation non‐trivial. Chinese word segmentation remains a challenging topic in Chinese computational linguistics. We survey previous approaches to Chinese word segmentation, including dictionary look‐up, strength of internal binding, as well as character tagging and machine learning. The Word Boundary Decision (WBD) approach which requires no prior lexical knowledge is proposed. It is shown that the WBD model greatly reduces the complexity of Chinese word segmentation and may provide a promising approach to address domain adaption and robustness issues.