z-logo
open-access-imgOpen Access
Automatic syllable segmentation of Myanmar texts using finite state transducer
Author(s) -
Tin Htay Hlaing,
Yoshiki Mikami
Publication year - 2014
Publication title -
international journal on advances in ict for emerging regions (icter)
Language(s) - English
Resource type - Journals
eISSN - 2550-2794
pISSN - 1800-4156
DOI - 10.4038/icter.v6i2.7150
Subject(s) - syllabification , computer science , syllable , artificial intelligence , natural language processing , unicode , speech recognition , grammar , scripting language , transliteration , linguistics , philosophy , operating system
— Automatic syllabification lies at the heart of script processing especially for the South East Asian scripts like Myanmar. Myanmar syllabification algorithms implemented so far are either rule-based or data-driven approach. This paper proposes a new method for Myanmar syllabification which deploys formal grammar and un-weighted finite state transducers (FST) as Myanmar syllabification relies heavily on formal model of syllable structure. Our proposed method focuses on orthographic way of syllabification for the input texts encoded in Unicode. We tackle syllabification of Myanmar words with standard syllable structure as well as words with irregular structures such as kinzi, consonant stacking which have not been resolved by previous methods. Our FST based syllabifier was tested on 11,732 distinct words extracted from Myanmar Orthography Corpus. The 11,732 words yielded 32,238 syllables and are compared with correctly hand syllabified words. Our FST based syllabification method performs with 99.93% accuracy and we use Stuttgart FST tools for our experiments.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom