OCRSpell: an interactive spelling correction system for OCR errors in text
Author(s) -
Kazem Taghva,
Eric Stofsky
Publication year - 2001
Publication title -
international journal on document analysis and recognition (ijdar)
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.331
H-Index - 50
eISSN - 1433-2833
pISSN - 1433-2825
DOI - 10.1007/pl00013558
Subject(s) - spelling , computer science , string (physics) , artificial intelligence , natural language processing , feature (linguistics) , confusion , matching (statistics) , error detection and correction , speech recognition , information retrieval , algorithm , linguistics , mathematics , psychology , philosophy , statistics , psychoanalysis , mathematical physics
In this paper, we describe a spelling correc- tion system designed specifically for OCR-generated text that selects candidate words through the use of infor- mation gathered from multiple knowledge sources. This system for text correction is based on static and dynamic device mappings, approximate string matching, and n- gram analysis. Our statistically based, Bayesian system incorporates a learning feature that collects confusion information at the collection and document levels. An evaluation of the new system is presented as well.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom