Premium
A clustering‐based feature selection framework for handwritten Indic script classification
Author(s) -
Chatterjee Iman,
Ghosh Manosij,
Singh Pawan Kumar,
Sarkar Ram,
Nasipuri Mita
Publication year - 2019
Publication title -
expert systems
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.365
H-Index - 38
eISSN - 1468-0394
pISSN - 0266-4720
DOI - 10.1111/exsy.12459
Subject(s) - computer science , feature selection , pattern recognition (psychology) , scripting language , cluster analysis , artificial intelligence , feature (linguistics) , identification (biology) , feature vector , field (mathematics) , selection (genetic algorithm) , feature extraction , filter (signal processing) , data mining , computer vision , mathematics , philosophy , linguistics , botany , pure mathematics , biology , operating system
In India, which has numerous officially recognized scripts, there is a primary need for categorizing the documents on the basis of the scripts used therein. Identification of script used in a document is essential for its effective handling both manually and digitally. Identification of script in a document image is an important research problem in the pattern recognition field, which, at times, suffers from the issue of growing dimensionality of the feature vector and requires an efficient feature selection technique. Keeping this fact in mind, in this paper, we propose a clustering‐based filter feature selection framework in order to extract an optimal and effective feature subset from the original feature vector. The present feature selection methodology is evaluated on a script classification problem involving handwritten documents in 12 major Indic scripts. Experiments are done at word‐level, text‐line‐level, and block‐level. Experiments demonstrate that a reasonable increment in classification accuracy has been realized using comparatively lesser number of features. The proposed framework for feature selection is computationally inexpensive and can be applied to other pattern recognition problems as well.