
Historical Corpora Correlation based on RNN and DCNN
Author(s) -
Lin Wei,
Zhaoyu Lin
Publication year - 2021
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1873/1/012048
Subject(s) - computer science , digitization , artificial intelligence , natural language processing , transcription (linguistics) , recurrent neural network , task (project management) , representation (politics) , german , character (mathematics) , linguistics , artificial neural network , computer vision , philosophy , management , politics , political science , law , economics , geometry , mathematics
Correcting historical corpora in digital version is a crucial task for the historical research, however, scan quality, book layout, visual character similarity can affect the quality of the recognizing. OCR is at the forefront of digitization projects for cultural heritage preservation. The main task is to identify characters from their visual form into their textual representation. In this paper, we propose a model combining recurrent neutral network(RNN) and deep convolutional network(DCNN) to correct OCR transcription errors. The experiment on a historical book corpus in German language shows that the model is very robust in capturing diverse OCR transcription errors greatly.