z-logo
Premium
Chicon—a Chinese text manipulation language
Author(s) -
Wong KamFai,
Lum Vincent Y.,
Lam WaiIp
Publication year - 1998
Publication title -
software: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.437
H-Index - 70
eISSN - 1097-024X
pISSN - 0038-0644
DOI - 10.1002/(sici)1097-024x(199807)28:7<681::aid-spe172>3.0.co;2-n
Subject(s) - icon , computer science , disk formatting , natural language processing , ascii , string (physics) , chinese characters , character (mathematics) , text processing , artificial intelligence , linguistics , programming language , philosophy , physics , quantum mechanics , operating system , geometry , mathematics
Text processing is an important computer application. Due to its importance, a number of text manipulation programming languages have been devised (e.g. Icon). These programming languages are very useful for applications such as natural language processing, text analysis, text editing, document formatting, text generation, etc. However, they were mainly designed to handle English texts, and are ineffective for Chinese. This is because English and Chinese texts are represented very differently in a computer. An English character is mainly represented in 7‐bit ASCII, and its Chinese counterpart commonly in 16‐bit GB or BIG‐5. This difference makes direct application of English‐based text manipulation programming languages to Chinese erroneous, e.g. application of Icon to reverse a string of Chinese characters. In this paper, a new dialect of Icon, referred to as Chicon (i.e. Chinese Icon), is proposed. In the design of Chicon, new data types were introduced to differentiate pure English and English/Chinese mixed texts. In addition, existing Icon text manipulation functions were modified to account for Chinese texts. Experiments have shown that Chicon not only could overcome the problems of Chinese processing in Icon, but its execution speed was actually superior to Icon in handling Chinese. Furthermore, application of Chicon to a real sized problem, namely word segmentation, has proved that the language is practical. © 1998 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here