Detecting and Integrating Multiword Expression into English-Arabic Statistical Machine Translation
Author(s) -
Sara Ebrahim,
Doaa Hegazy,
Mostafa G. M. Mostafa,
Samhaa R. El-Beltagy
Publication year - 2017
Publication title -
procedia computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.334
H-Index - 76
ISSN - 1877-0509
DOI - 10.1016/j.procs.2017.10.099
Subject(s) - computer science , natural language processing , phrase , machine translation , artificial intelligence , parsing , arabic , perspective (graphical) , evaluation of machine translation , translation (biology) , expression (computer science) , example based machine translation , machine translation software usability , linguistics , programming language , philosophy , biochemistry , chemistry , messenger rna , gene
In this paper we introduce a new method for detecting a type of English Multiword Expressions (MWEs), which is phrasal verbs, into an English-Arabic phrase-based statistical machine translation (PBSMT) system. The detection starts with parsing the English side of the parallel corpus, detecting various linguistic patterns for phrasal verbs and finally integrate them into the En-Ar PBSMT system. In addition, the paper explores the effect of cliticizing specific words in English that have no Arabic equivalent. The results, which reported with the BLEU scores, showed that some patterns achieved significant improvements compared to other patterns and still the baseline achieves the highest score. This paper shows that, by detecting more linguistic patterns and integrating them into En-Ar SMT system, translation quality could be improved with other integration methods. Yet, the results show which path is worth to follow and clarifies the perspective that linguistic features are not handled properly in the statistically learned models.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom