z-logo
open-access-imgOpen Access
Machine Translation and the Information Soup
Author(s) -
David Farwell,
Laurie Gerber,
Eduard Hovy
Publication year - 1998
Publication title -
lecture notes in computer science
Language(s) - English
Resource type - Book series
SCImago Journal Rank - 0.249
H-Index - 400
eISSN - 1611-3349
pISSN - 0302-9743
DOI - 10.1007/3-540-49478-2
Subject(s) - computer science , machine translation , translation (biology) , natural language processing , artificial intelligence , information retrieval , biochemistry , chemistry , messenger rna , gene
We present two problems for statistically extracting bilingual lexicon: (1) How can noisy parallel corpora be used? (2) How can non-parallel yet comparable corpora be used? We describe our own work and contribution in relaxing the constraint of using only clean parallel corpora. DKvec is a method for extracting bilingual lexicons, from noisy parallel corpora based on arrival distances of words in noisy parallel corpora. Using DKvec on noisy parallel corpora in English/Japanese and English/Chinese, our evaluations show a 55.35% precision from a small corpus and 89.93% precision from a larger corpus. Our major contribution is in the extraction of bilingual lexicon from non-parallel corpora. We present a rst such result in this area, from a new method{Convec. Convec is based on context information of a word to be translated. We show a 30% to 76% precision when top-one to top-20 translation candidates are considered. Most of the top-20 candidates are either collocations or words related to the correct translation. Since non-parallel corpora contain a lot more polysemous words, many-to-many translations, and di erent lexical items in the two languages, we conclude that the output from Convec is reasonable and useful.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom