Printed Persian Subword Recognition Using Wavelet Packet Descriptors
Author(s) -
Samira Nasrollahi,
Afshin Ebrahimi
Publication year - 2013
Publication title -
journal of engineering
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.244
H-Index - 20
eISSN - 2314-4912
pISSN - 2314-4904
DOI - 10.1155/2013/465469
Subject(s) - persian , pattern recognition (psychology) , artificial intelligence , computer science , speech recognition , invariant (physics) , font , feature vector , wavelet transform , optical character recognition , feature extraction , feature (linguistics) , wavelet packet decomposition , wavelet , mathematics , image (mathematics) , philosophy , linguistics , mathematical physics
In this paper, we present a new approach to offline OCR (optical character recognition) for printed Persian subwords using wavelet packet transform. The proposed algorithm is used to extract font invariant and size invariant features from 87804 subwords of 4 fonts and 3 sizes. The feature vectors are compressed using PCA. The obtained feature vectors yield a pictorial dictionary for which an entry is the mean of each group that consists of the same subword with 4 fonts in 3 sizes. The sets of these features are congregated by combining them with the dot features for the recognition of printed Persian subwords. To evaluate the feature extraction results, this algorithm was tested on a set of 2000 subwords in printed Persian text documents. An encouraging recognition rate of 97.9% is got at subword level recognition
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom