Generating an Ordered Data Set from an OCR Text File
Author(s) -
J. J. Crump
Publication year - 2014
Publication title -
the programming historian
Language(s) - English
Resource type - Journals
ISSN - 2397-2068
DOI - 10.46430/phen0036
Subject(s) - python (programming language) , parsing , computer science , metadata , set (abstract data type) , information retrieval , natural language processing , programming language , artificial intelligence , world wide web
This tutorial illustrates strategies for taking raw OCR output from a scanned text, parsing it to isolate and correct essential elements of metadata, and generating an ordered data set (a python dictionary) from it.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom