Cleaning OCR’d text with Regular Expressions
Author(s) -
Laura Turner O'Hara
Publication year - 2013
Publication title -
the programming historian
Language(s) - English
Resource type - Journals
ISSN - 2397-2068
DOI - 10.46430/phen0024
Subject(s) - optical character recognition , usable , computer science , artificial intelligence , character (mathematics) , natural language processing , character recognition , speech recognition , image (mathematics) , world wide web , mathematics , geometry
Optical Character Recognition (OCR)—the conversion of scanned images to machine-encoded text—has proven a godsend for historical research. This lesson will help you clean up OCR'd text to make it more usable.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom