Premium
Subspace models for document script and language identification
Author(s) -
Vikram T. N.,
Gowda K. Chidananda
Publication year - 2010
Publication title -
international journal of imaging systems and technology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.359
H-Index - 47
eISSN - 1098-1098
pISSN - 0899-9457
DOI - 10.1002/ima.20215
Subject(s) - computer science , scripting language , identification (biology) , paragraph , subspace topology , natural language processing , artificial intelligence , language identification , natural language , programming language , world wide web , botany , biology
In this article, we explore the suitability of subspace models like 2DPCA [Yang et al., IEEE Trans Pattern Anal Machine Intelligence 26 (2004), 131–137], 2DFLD [Yang et al., Pattern Recogn 38 (2005), 1125–1129], etc. for document script and language identification. They are employed to identify language and script at both paragraph and word level. Elaborate experimentation has been conducted which has revealed that they are robust enough to handle highly confusing scripts and their performance does not degrade drastically even in the presence of noise. A generic language identification has been attempted in this work, to identify languages of both Asian and European origin by considering a dataset of 20 different languages. © 2010 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 20, 140–148, 2010