z-logo
open-access-imgOpen Access
The IMPACT project Polish Ground-Truth texts as a Djvu corpus
Author(s) -
Janusz S. Bień
Publication year - 2014
Publication title -
cognitive studies
Language(s) - English
Resource type - Journals
eISSN - 2392-2397
pISSN - 2080-7147
DOI - 10.11649/cs.2014.008
Subject(s) - character (mathematics) , ground truth , computer science , natural language processing , transcription (linguistics) , license , corpus linguistics , linguistics , artificial intelligence , philosophy , mathematics , geometry , operating system
The IMPACT project Polish Ground-Truth texts as a Djvu corpusThe purpose of the paper is twofold. First, to describe the already implemented idea of DjVu corpora, i.e. corpora which consist of both scanned images and a transcription of the texts with the words associated with their occurrences in the scans. Secondly, to present a case study of a corpus consisting of almost 5 000 pages of Polish historical texts dating from 1570 to 1756 (it is practically the very first corpus of historical Polish). The tools described have universal character and are freely available under the GNU GPL license, hence they can be used also for other purposes.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here