
A Holistic Approach to Urdu Language Word Recognition using Deep Neural Networks
Author(s) -
Hashim Raza Khan,
Muhammad Abul Hasan,
Majida Kazmi,
Nosheen Fayyaz,
Hasam Khalid,
Saad Ahmed Qazi
Publication year - 2021
Publication title -
engineering, technology and applied science research/engineering, technology and applied science research
Language(s) - English
Resource type - Journals
eISSN - 2241-4487
pISSN - 1792-8036
DOI - 10.48084/etasr.4143
Subject(s) - urdu , cursive , computer science , natural language processing , artificial intelligence , hindi , optical character recognition , scope (computer science) , convolutional neural network , character (mathematics) , speech recognition , linguistics , mathematics , image (mathematics) , philosophy , geometry , programming language
Urdu is one of the most popular languages in the world. It is a Persianized standard register of the Hindi language with considerable and valuable literature. While digital libraries are constantly replacing conventional libraries, a vast amount of Urdu literature is still handwritten. Digitizing this handwritten literature is essential to preserve it and make it more accessible. Nevertheless, the scarcity of Urdu Optical Character Recognition (OCR) research limits a digital library's scope to a manual document search. The limited research work in this area is mainly due to the complexity of Urdu Script. Unlike the English language, the Urdu writing style is cursive, bidirectional, and character shapes and sizes highly vary depending on their position. Holistic word recognition is found to be a better solution among many other text segmentation techniques as it takes the complete word into account instead of segmenting it explicitly or implicitly. For this project, the data of five different Urdu words were collected for training and testing a convolutional neural network and 96% recognition accuracy was achieved.