Automatic syllable segmentation of Myanmar texts using finite state transducer
Author(s) -
Tin Htay Hlaing,
Yoshiki Mikami
Publication year - 2014
Publication title -
international journal on advances in ict for emerging regions (icter)
Language(s) - English
Resource type - Journals
eISSN - 2550-2794
pISSN - 1800-4156
DOI - 10.4038/icter.v6i2.7150
Subject(s) - syllabification , computer science , syllable , artificial intelligence , natural language processing , unicode , speech recognition , grammar , scripting language , transliteration , linguistics , philosophy , operating system
— Automatic syllabification lies at the heart of script processing especially for the South East Asian scripts like Myanmar. Myanmar syllabification algorithms implemented so far are either rule-based or data-driven approach. This paper proposes a new method for Myanmar syllabification which deploys formal grammar and un-weighted finite state transducers (FST) as Myanmar syllabification relies heavily on formal model of syllable structure. Our proposed method focuses on orthographic way of syllabification for the input texts encoded in Unicode. We tackle syllabification of Myanmar words with standard syllable structure as well as words with irregular structures such as kinzi, consonant stacking which have not been resolved by previous methods. Our FST based syllabifier was tested on 11,732 distinct words extracted from Myanmar Orthography Corpus. The 11,732 words yielded 32,238 syllables and are compared with correctly hand syllabified words. Our FST based syllabification method performs with 99.93% accuracy and we use Stuttgart FST tools for our experiments.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom