
Generating an Ordered Data Set from an OCR Text File
Author(s) -
J.J. Crump
Publication year - 2014
Publication title -
the programming historian
Language(s) - English
Resource type - Journals
ISSN - 2397-2068
DOI - 10.46430/phen0036
Subject(s) - python (programming language) , parsing , computer science , metadata , set (abstract data type) , information retrieval , natural language processing , programming language , artificial intelligence , world wide web
This tutorial illustrates strategies for taking raw OCR output from a scanned text, parsing it to isolate and correct essential elements of metadata, and generating an ordered data set (a python dictionary) from it.