
Digital extraction of knowledge from early-modern books
Author(s) -
Kazuki Joe
Publication year - 2021
Publication title -
impact
Language(s) - English
Resource type - Journals
eISSN - 2398-7081
pISSN - 2398-7073
DOI - 10.21820/23987073.2021.3.89
Subject(s) - newspaper , optical character recognition , task (project management) , computer science , writing style , character (mathematics) , world wide web , multimedia , artificial intelligence , media studies , engineering , sociology , literature , art , geometry , mathematics , systems engineering , image (mathematics)
As information and communications technology has advanced, there is increased interest in digitally archiving books and other materials that previously have never been archived in such a way. This is beneficial to researchers, teachers, students and the general public, enabling them to easily access useful historical information. The digital archiving of old newspapers is a work in progress but there are obstacles to this as scanning fonts from 1850, for example, using optical character recognition (OCR), which is the main method used to convert materials to text, is challenging and it's not currently possible to perform a full text search. Professor Kazuki Joe, Department of Information and Computer Sciences, Nara Women's University, Japan, leads a team of researchers that are working to make it possible to perform full text searches for early-modern books, magazines and newspapers. This is an especially difficult task as the team is working with Japanese texts and the early-modern writing style in Japan is different from that of today. As such, the researchers first focused on the automatic conversion of letterpress book images into text and then realised the need for automatic translation of early-modern literary texts into present colloquialisms.