Ways to Improve N-Gram Language Models for OCR and Speech Recognition of Slavic Languages | Zendy

Владимир Юрьевич Тарануха | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Ways to Improve N-Gram Language Models for OCR and Speech Recognition of Slavic Languages

Author(s) -

Владимир Юрьевич Тарануха

Publication year - 2014

Publication title -

the advanced science journal

Language(s) - English

Resource type - Journals

eISSN - 2219-7478

pISSN - 2219-746X

DOI - 10.15550/asj.2014.04.065

Subject(s) - slavic languages , computer science , linguistics , natural language processing , speech recognition , artificial intelligence , philosophy

The problems of n-gram models for the OCR and speech recognition for the Slavic languages are investigated. The paper proposes methods applicable for most Slavic languages. Two approaches are tested: filtering of the n-gram model and the alternative ways of carrying out the smoothing. The filtering relies on heuristics based on frequencies and morphological features of words. The smoothing uses classes based on morphological features in combinations with new discounting formula. The smoothing can also be combined with inner filtering. The numerical experiments for the Ukrainian language show that both approaches produce interesting results. However, smoothing is more promising while being more complex and requiring further investigation of development of proper classes based on morphological information in order to outperform standard smoothing techniques.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research