Corpus-based technique for improving Arabic OCR system | Zendy

Ahmed H. Aliwy | Zendy; Basheer Al-Sadawi | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Corpus-based technique for improving Arabic OCR system

Author(s) -

Ahmed H. Aliwy,

Basheer Al-Sadawi

Publication year - 2021

Publication title -

indonesian journal of electrical engineering and computer science

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.241

H-Index - 17

eISSN - 2502-4760

pISSN - 2502-4752

DOI - 10.11591/ijeecs.v21.i1.pp233-241

Subject(s) - computer science , optical character recognition , natural language processing , character (mathematics) , arabic , word (group theory) , artificial intelligence , sentence , context (archaeology) , speech recognition , process (computing) , span (engineering) , image (mathematics) , linguistics , engineering , mathematics , programming language , paleontology , philosophy , civil engineering , geometry , biology

An optical character recognition (OCR) refers to a process of converting the text document images into editable and searchable text. OCR process poses several challenges in particular in the Arabic language due to it has caused a high percentage of errors. In this paper, a method, to improve the outputs of the Arabic Optical character recognition (AOCR) Systems is suggested based on a statistical language model built from the available huge corpora. This method includes detecting and correcting non-word and real words error according to the context of the word in the sentence. The results show that the percentage of improvement in the results is up to (98%) as a new accuracy for AOCR output.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research