
Combining the NER-OCR methods to improve information retrieval efficiency in the Indonesian posters
Author(s) -
Ahmad Syarif Rosidy,
Tubagus Mohammad Akhriza,
M.Kom Ir. Muchammad Husni
Publication year - 2020
Publication title -
jurnal teknologi dan sistem komputer
Language(s) - English
Resource type - Journals
eISSN - 2620-4002
pISSN - 2338-0403
DOI - 10.14710/jtsiskom.2020.13686
Subject(s) - computer science , indonesian , upload , named entity recognition , dissemination , information retrieval , optical character recognition , digital library , information extraction , artificial intelligence , natural language processing , world wide web , task (project management) , engineering , image (mathematics) , telecommunications , philosophy , linguistics , poetry , systems engineering , art , literature
Event organizers in Indonesia often use websites to disseminate information about these events through digital posters. However, manually processing for transferring information from posters to websites is constrained by time efficiency, given the increasing number of posters uploaded. Also, information retrieval methods, such as Named Entity Recognition (NER) for Indonesian posters, are still rarely discussed in the literature. In contrast, the NER method application to Indonesian corpus is challenged by accuracy improvement because Indonesian is a low-resource language that causes a lack of corpus availability as a reference. This study proposes a solution to improve the efficiency of information extraction time from digital posters. The proposed solution is a combination of the NER method with the Optical Character Recognition (OCR) method to recognize text on posters developed with the support of relevant training data corpus to improve accuracy. The experimental results show that the system can increase time efficiency by 94 % with 82-92 % accuracy for several extracted information entities from 50 testing digital posters.