"Q i-jtb the Raven": Taking Dirty OCR Seriously
Author(s) -
Ryan Cordell
Publication year - 2017
Publication title -
book history
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.102
0eISSN - 1529-1499
pISSN - 1098-7371
DOI - 10.1353/bh.2017.0006
Subject(s) - digitization , optical character recognition , computer science , scale (ratio) , history , paratext , character (mathematics) , documentation , literature , artificial intelligence , art , cartography , geometry , mathematics , image (mathematics) , computer vision , programming language , geography
This article argues that scholars must understand mass digitized texts as assemblages of new editions, subsidiary editions, and impressions of their historical sources, and that these various parts require sustained bibliographic analysis and description. To adequately theorize any research conducted in large-scale text archives—including research that includes primary or secondary sources discovered through keyword search—we must avoid the myth of surrogacy proffered by page images and instead consider directly the text files they overlay. Focusing on the OCR (optical character recognition) from which most large-scale historical text data derives, this article argues that the results of this "automatic" process are in fact new editions of their source texts that offer unique insights into both the historical texts they remediate and the more recent era of their remediation. The constitution and provenance of digitized archives are, to some extent at least, knowable and describable. Just as details of type, ink, or paper, or paratext such as printer's records can help us establish the histories under which a printed book was created, details of format, interface, and even grant proposals can help us establish the histories of corpora created under conditions of mass digitization.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom