Segmentation of Ancient Telugu Text Documents
Author(s) -
Srinivasa Rao A.V
Publication year - 2012
Publication title -
international journal of image graphics and signal processing
Language(s) - English
Resource type - Journals
eISSN - 2074-9082
pISSN - 2074-9074
DOI - 10.5815/ijigsp.2012.06.02
Subject(s) - telugu , segmentation , artificial intelligence , computer science , pattern recognition (psychology) , scripting language , process (computing) , line (geometry) , scale space segmentation , image segmentation , natural language processing , computer vision , information retrieval , mathematics , geometry , operating system
OCR of ancient document images remains a challenging task till date. Scanning process itself introduces deformation of document images. Cleaning process of these document images will result in information loss. Segmentation contributes an invariance process in OCR. Complex scripts, like derivatives of Brahmi, encounter many problems in the segmentation process. Segmentation of meaningful units, (instead of isolated patterns), revealed interesting trends. A segmentation technique for the ancient Telugu document image into meaningful units is proposed. The topological features of the meaningful units within the script line are adopted as a basis, while segmenting the text line. Horizontal profile pattern is convolved with Gaussian kernel. The statistical properties of meaningful units are explored by extensively analyzing the geometrical patterns of the meaningful unit. The efficiency of the proposed algorithm involving segmentation process is found to be 73.5% for the case of uncleaned document images.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom