
Balinese character recognition on mobile application based on tesseract open source OCR engine
Author(s) -
I M D R Mudiarta,
I Made Dwita Atmaja,
I K Suharsana,
I W G S Antara,
I W P Bharaditya,
G A Suandirat,
Gede Indrawan
Publication year - 2020
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1516/1/012017
Subject(s) - optical character recognition , computer science , character (mathematics) , android (operating system) , natural language processing , speech recognition , open source , font , character recognition , artificial intelligence , image (mathematics) , linguistics , software , programming language , philosophy , geometry , mathematics , operating system
Balinese script is a part of Balinese culture is rarely used today. The Provincial Government of Bali with the Governor Regulation number 80 of 2018 is trying to preserve the Balinese language and script. This study aimed at preserving the Balinese script through a mobile technology approach which is the recent trend with worldwide coverage for supporting ubiquitous learning. This research integrated the Android application to recognize Balinese characters in the form of images into text with Tesseract open source Optical Character Recognition (OCR) engine. The input of this application is a Balinese script image captured by a mobile camera or from a Balinese script image. The application recognized input image into text that can be further processed based on training data available in the application. The new Balinese script training data was created based on eighteen Balinese script’s basic syllables and numbers only. This application can be operated offline with mobile hardware that supports camera functions. The result for testing for 50-word, recognition was 62% obtained in good quality image-based Bali-Simbar font. This application can be further developed to recognize other character repertoire i.e., vowels (Akśara Suara), semi vowels (Arda Suara), additional syllables (Akśara Şwalalita), and sound killers (Pangangge Tengenan).