z-logo
open-access-imgOpen Access
A font and size-independent OCR system for printed Kannada documents using support vector machines
Author(s) -
T. V. Ashwin,
P. S. Sastry
Publication year - 2002
Publication title -
sadhana
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.268
H-Index - 49
eISSN - 0973-7677
pISSN - 0256-2499
DOI - 10.1007/bf02703311
Subject(s) - computer science , font , optical character recognition , support vector machine , artificial intelligence , kannada , document layout analysis , set (abstract data type) , segmentation , character (mathematics) , natural language processing , software , pattern recognition (psychology) , image (mathematics) , speech recognition , programming language , geometry , mathematics
This paper describes an OCR system for printed text documents in Kannada, a South Indian language. The input to the system would be the scanned image of a page of text and the output is a machine editable file compatible with most typesetting software. The system first extracts words from the document image and then segments the words into sub-character level pieces. The segmentation algorithm is motivated by the structure of the script. We propose a novel set of features for the recognition problem which are computationally simple to extract. The final recognition is achieved by employing a number of 2-class classifiers based on the Support Vector Machine (SVM) method. The recognition is independent of the font and size of the printed text and the system is seen to deliver reasonable performance.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom