z-logo
open-access-imgOpen Access
Ontologies and Bigram-based approach for Isolated Non-word Errors Correction in OCR System
Author(s) -
Aicha Eutamene,
Mohamed-Khireddine Kholladi,
Hacene Belhadef
Publication year - 2015
Publication title -
international journal of electrical and computer engineering
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.277
H-Index - 22
ISSN - 2088-8708
DOI - 10.11591/ijece.v5i6.pp1458-1467
Subject(s) - bigram , computer science , error detection and correction , spelling , word (group theory) , artificial intelligence , natural language processing , speech recognition , wordnet , optical character recognition , character (mathematics) , pattern recognition (psychology) , algorithm , image (mathematics) , linguistics , philosophy , geometry , mathematics , trigram
In this paper, we describe a new and original approach for post-processing step in an OCR system. This approach is based on new method of spelling correction to correct automatically misspelled words resulting from a character recognition step of scanned documents by combining both ontologies and bigram code in order to create a robust system able to solve automatically the anomalies of classical approaches. The proposed approach is based on a hybrid method which is spread over two stages, first one is character recognition by using the ontological model and the second one is word recognition based on spelling correction approach based on bigram codification for detection and correction of errors. The spelling error is broadly classified in two categories namely non-word error and real-word error. In this paper, we interested only on detection and correction of non-word errors because this is the only type of errors treated by an OCR. In addition, the use of an online external resource such as WordNet proves necessary to improve its performances.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here