Pushdown Automata in Statistical Machine Translation
Author(s) -
Cyril Allauzen,
Bill Byrne,
Adrià de Gispert,
Gonzalo Iglesias,
Michael Riley
Publication year - 2014
Publication title -
computational linguistics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.314
H-Index - 98
eISSN - 1530-9312
pISSN - 0891-2017
DOI - 10.1162/coli_a_00197
Subject(s) - computer science , machine translation , finite state machine , synchronous context free grammar , rule based machine translation , pushdown automaton , automaton , context (archaeology) , natural language processing , transfer based machine translation , artificial intelligence , example based machine translation , translation (biology) , context free grammar , decoding methods , sentence , task (project management) , grammar , theoretical computer science , algorithm , paleontology , biochemistry , chemistry , linguistics , management , philosophy , messenger rna , gene , economics , biology
This article describes the use of pushdown automata (PDA) in the context of statistical machine translation and alignment under a synchronous context-free grammar. We use PDAs to compactly represent the space of candidate translations generated by the grammar when applied to an input sentence. General-purpose PDA algorithms for replacement, composition, shortest path, and expansion are presented. We describe HiPDT, a hierarchical phrase-based decoder using the PDA representation and these algorithms. We contrast the complexity of this decoder with a decoder based on a finite state automata representation, showing that PDAs provide a more suitable framework to achieve exact decoding for larger synchronous context-free grammars and smaller language models. We assess this experimentally on a large-scale Chinese-to-English alignment and translation task. In translation, we propose a two-pass decoding strategy involving a weaker language model in the first-pass to address the results of PDA complexity analysis. We study in depth the experimental conditions and tradeoffs in which HiPDT can achieve state-of-the-art performance for large-scale SMT.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom