Scalable ranked retrieval using document images | Zendy

Rajiv Jain | Zendy; Douglas W. Oard | Zendy; David Doermann | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Scalable ranked retrieval using document images

Author(s) -

Rajiv Jain,

Douglas W. Oard,

David Doermann

Publication year - 2013

Publication title -

proceedings of spie, the international society for optical engineering/proceedings of spie

Language(s) - English

Resource type - Conference proceedings

SCImago Journal Rank - 0.192

H-Index - 176

eISSN - 1996-756X

pISSN - 0277-786X

DOI - 10.1117/12.2038656

Subject(s) - computer science , scalability , information retrieval , database

Despite the explosion of text on the Internet, hard copy documents that have been scanned as images still play a significant role for some tasks. The best method to perform ranked retrieval on a large corpus of document images, however, remains an open research question. The most common approach has been to perform text retrieval using terms generated by optical character recognition. This paper, by contrast, examines whether a scalable segmentation-free image retrieval algorithm, which matches sub-images containing text or graphical objects, can provide additional benefit in satisfying a user’s information needs on a large, real world dataset. Results on 7 million scanned pages from the CDIP v1.0 test collection show that content based image retrieval finds a substantial number of documents that text retrieval misses, and that when used as a basis for relevance feedback can yield improvements in retrieval effectiveness.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research