
Cleaning OCR’d text with Regular Expressions
Author(s) -
Laura Turner O'Hara
Publication year - 2013
Publication title -
the programming historian
Language(s) - English
Resource type - Journals
ISSN - 2397-2068
DOI - 10.46430/phen0024
Subject(s) - optical character recognition , usable , computer science , artificial intelligence , character (mathematics) , natural language processing , character recognition , speech recognition , image (mathematics) , world wide web , mathematics , geometry
Optical Character Recognition (OCR)—the conversion of scanned images to machine-encoded text—has proven a godsend for historical research. This lesson will help you clean up OCR'd text to make it more usable.