Premium
Reading handwritten phrases on U.S. census forms
Author(s) -
Madhvanath S.,
Govindaraju V.,
Srihari S. N.
Publication year - 1996
Publication title -
international journal of imaging systems and technology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.359
H-Index - 47
eISSN - 1098-1098
pISSN - 0899-9457
DOI - 10.1002/(sici)1098-1098(199624)7:4<312::aid-ima6>3.0.co;2-b
Subject(s) - computer science , nist , lexicon , reading (process) , phrase , task (project management) , natural language processing , artificial intelligence , census , point (geometry) , word error rate , field (mathematics) , information retrieval , speech recognition , linguistics , population , mathematics , philosophy , demography , geometry , management , sociology , pure mathematics , economics
Commercial form‐reading systems for extraction of data from forms do not meet acceptable accuracy requirements on forms filled out by hand. Several important form‐processing applications involve the automated reading of handwritten responses. U.S. Census forms are a case in point. A database of form images containing actual responses received by the U.S. Census Bureau was made available by National Institute of Standards and Technology (NIST) in December 1993. A number of factors combine to make the task of reading these forms a challenging one. The quality of form images is often poor, and the handwritten responses are very loosely constrained in terms of writing style, format of response, and choice of text. The sizes of the lexicons provided are large (10,000‐50,000 entries) and yet the coverage is incomplete (60%‐70%). In this article we discuss our approach to automate the task of reading the census forms. The subtasks of field extraction and phrase recognition are described and multiclassifier control strategies for phrase recognition are presented. The error rate of the system when no rejects are allowed is 59%, with a lower bound of 40% being imposed by the incomplete coverage of the lexicon. The article concludes with a discussion of experimental results and directions for future research. © 1996 John Wiley & Sons, Inc.