Feature-Based Decipherment for Machine Translation
Author(s) -
Iftekhar Naim,
Parker Riley,
Daniel Gildea
Publication year - 2018
Publication title -
computational linguistics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.314
H-Index - 98
eISSN - 1530-9312
pISSN - 0891-2017
DOI - 10.1162/coli_a_00326
Subject(s) - decipherment , computer science , artificial intelligence , orthographic projection , feature (linguistics) , machine translation , inference , generative model , divergence (linguistics) , natural language processing , generative grammar , pattern recognition (psychology) , linguistics , philosophy
Orthographic similarities across languages provide a strong signal for unsupervised probabilistic transduction (decipherment) for closely related language pairs. The existing decipherment models, however, are not well suited for exploiting these orthographic similarities. We propose a log-linear model with latent variables that incorporates orthographic similarity features. Maximum likelihood training is computationally expensive for the proposed log-linear model. To address this challenge, we perform approximate inference via Markov chain Monte Carlo sampling and contrastive divergence. Our results show that the proposed log-linear model with contrastive divergence outperforms the existing generative decipherment models by exploiting the orthographic features. The model both scales to large vocabularies and preserves accuracy in low- and no-resource contexts.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom