Premium
A Pattern‐Based Approach Using Compound Unit Recognition and Its Hybridization with Rule‐Based Translation
Author(s) -
Jung Hanmin,
Yuh Sanghwa,
Kim Taewan,
Park Sangkyu
Publication year - 1999
Publication title -
computational intelligence
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.353
H-Index - 52
eISSN - 1467-8640
pISSN - 0824-7935
DOI - 10.1111/0824-7935.00087
Subject(s) - computer science , parsing , artificial intelligence , natural language processing , trie , machine translation , translation (biology) , flexibility (engineering) , speech recognition , pruning , pattern recognition (psychology) , mathematics , data structure , programming language , biochemistry , chemistry , statistics , biology , messenger rna , agronomy , gene
This paper describes a compound unit (CU) recognizer as a pattern‐based approach and its hybridization with rule‐based translation. A compound unit is a combined concept including collocations, idioms, and compound nouns. CU recognition reduces part of speech ambiguities by combining several words into a unit and consequently lessening the parsing load. It also provides pretranslated natural equivalents. Our focus in this paper is to obtain flexibility and efficiency from pattern‐based machine translation, and high‐quality translation by hybridization. A modified trie, our search index structure using “method” strategy is used to manage heterogeneous property of the constituents. Syntactic verification is integrated to obtain precise CU recognition by means of pruning wrongly recognized units that are caused by improper variable hypotheses. The experimental result with verification shows that the precision of CU recognition is increased to 99.69% with 31 CFG rules on the cyclic trie structure for 1,268 Wall Street Journal articles of the Penn Treebank. Another experiment with CU recognition also shows that it raises the understandability of translation for Web documents.