z-logo
open-access-imgOpen Access
Creating and validating a large image database for METTREC
Author(s) -
Michael D. Garris,
William W. Klein
Publication year - 1997
Language(s) - English
Resource type - Reports
DOI - 10.6028/nist.ir.6090
Subject(s) - computer science , database , image (mathematics) , information retrieval , artificial intelligence
The National Institute of Standards and Technology (NIST) is in the process of setting up a new series of conferences named the Metadata Text Retrieval Conferences (METTREC). It will focus on evaluating document conversion using optical character recognition (OCR), and information retrieval (IR) technologies. Evaluations will be designed to investigate the impact of machine recognition errors upon information retrieval and to determine what interfaces are appropriate to integrate the two technologies. To implement this conference, we require databases that can be used for conference evaluations and has chosen the Federal Register to be the initial document source. It is a large, complete set of documents containing metadata that will allow quantitative evaluation of recognition and retrieval technologies. This paper describes the activities associated with scanning the Federal Register and validating the document images within the database. The process of image validation includes translating filenames, assuring image integrity, and verifying correct page sequences. In order to reduce the cost of validation, we minimized human resource expenditure by exploiting OCR and high-speed visual adjudication from images by an operator. This process minimizes the expensive handling of paper to validate document image collections.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom