Statistical Machine Translation of Croatian Weather Forecasts: How Much Data Do We Need? | Zendy

Nikola Ljubešić | Zendy; Petra Bago | Zendy; Damir Boras | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Statistical Machine Translation of Croatian Weather Forecasts: How Much Data Do We Need?

Author(s) -

Nikola Ljubešić,

Petra Bago,

Damir Boras

Publication year - 2010

Publication title -

journal of computing and information technology

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.169

H-Index - 27

eISSN - 1846-3908

pISSN - 1330-1136

DOI - 10.2498/cit.1001917

Subject(s) - computer science , nist , croatian , fluency , machine translation , set (abstract data type) , sample (material) , sentence , training set , natural language processing , artificial intelligence , linguistics , philosophy , chemistry , chromatography , programming language

This research is the first step towards developing a system for translating Croatian weather forecasts into multiple languages. This step deals with the Croatian-English language pair. The parallel corpus consists of a one-year sample of the weather forecasts for the Adriatic, consisting of 7,893 sentence pairs. Evaluation is performed by the automatic evaluation measures BLUE, NIST and METEOR, as well as by manually evaluating a sample of 200 translations. We have shown that with a small-sized training set and the state-of-the artMoses system, decoding can be done with 96% accuracy concerning adequacy and fluency. Additional improvement is expected by increasing the training set size. Finally, the correlation of the recorded evaluation measures is explored

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research