Creating and validating a large image database for METTREC | Zendy

Michael D. Garris | Zendy; William W. Klein | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Creating and validating a large image database for METTREC

Author(s) -

Michael D. Garris,

William W. Klein

Publication year - 1997

Language(s) - English

Resource type - Reports

DOI - 10.6028/nist.ir.6090

Subject(s) - computer science , database , image (mathematics) , information retrieval , artificial intelligence

The National Institute of Standards and Technology (NIST) is in the process of setting up a new series of conferences named the Metadata Text Retrieval Conferences (METTREC). It will focus on evaluating document conversion using optical character recognition (OCR), and information retrieval (IR) technologies. Evaluations will be designed to investigate the impact of machine recognition errors upon information retrieval and to determine what interfaces are appropriate to integrate the two technologies. To implement this conference, we require databases that can be used for conference evaluations and has chosen the Federal Register to be the initial document source. It is a large, complete set of documents containing metadata that will allow quantitative evaluation of recognition and retrieval technologies. This paper describes the activities associated with scanning the Federal Register and validating the document images within the database. The process of image validation includes translating filenames, assuring image integrity, and verifying correct page sequences. In order to reduce the cost of validation, we minimized human resource expenditure by exploiting OCR and high-speed visual adjudication from images by an operator. This process minimizes the expensive handling of paper to validate document image collections.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research