z-logo
open-access-imgOpen Access
Fine-Grained Image Retrieval: the Text/Sketch Input Dilemma
Author(s) -
Jifei Song,
Yi-Zhe Song,
Tony Xiang,
Timothy M. Hospedales
Publication year - 2017
Language(s) - English
Resource type - Conference proceedings
DOI - 10.5244/c.31.45
Subject(s) - sketch , computer science , dilemma , image retrieval , information retrieval , image (mathematics) , artificial intelligence , computer vision , algorithm , mathematics , geometry
Fine-grained image retrieval (FGIR) enables a user to search for a photo of an object instance based on a mental picture. Depending on how the object is described by the user, two general approaches exist: sketch-based FGIR or text-based FGIR, each of which has its own pros and cons. However, no attempt has been made to systematically investigate how informative each of these two input modalities is, and more importantly whether they are complementary to each thus should be modelled jointly. In this work, for the first time we introduce a multi-modal FGIR dataset with both sketches and sentences description provided as query modalities. A multi-modal quadruplet deep network is formulated to jointly model the sketch and text input modalities as well as the photo output modality. We show that on its own the sketch modality is much more informative than text and each modality can benefit the other when they are modelled jointly.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom