z-logo
open-access-imgOpen Access
An empirical evaluation of phrase-based statistical machine translation for Indonesia slang-word translator
Author(s) -
Kyrie Cettyara Eleison,
Sari Uli Inggrid Hutahaean,
Sarah Christine Tampubolon,
Teamsar Muliadi Panggabean,
Ike Fitriyaningsih
Publication year - 2022
Publication title -
indonesian journal of electrical engineering and computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.241
H-Index - 17
eISSN - 2502-4760
pISSN - 2502-4752
DOI - 10.11591/ijeecs.v25.i3.pp1803-1813
Subject(s) - computer science , machine translation , slang , phrase , natural language processing , artificial intelligence , word (group theory) , benchmark (surveying) , span (engineering) , machine translation software usability , example based machine translation , linguistics , engineering , philosophy , civil engineering , geodesy , geography
The use of slang (non-standard language), especially in social media, is increasing. It causes reducing the level of understanding when communicating because not everyone understands slang (non-standard language). The purpose of this work is to develop a slang-word translator. The other objective is to find the minimum number of sentences and BiLingual Evaluation Understudy (BLEU) score used as a benchmark to determine that the translation is understandable. The approach used in this project is a Phrase-based statistical machine translation (PBSMT) approach, suitable for low resource language, with a dataset of 100,000 sentences taken from the comments column of several online political news portals. The comments are then manually translated to produce a parallel corpus of non-standard language-standard language. The sample sentences are taken from the dataset then distributed using questionnaires to obtain the human understanding level regarding the translation result. The result of the implementation is a BLEU score of 64 and the minimum number of sentences to have an understandable machine translation is 500. The conclusion drawn from the distributed questionnaires is that humans can understand the sentences produced by the translation machine.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here