Snap-and-ask | Zendy

Wei Zhang | Zendy; Lei Pang | Zendy; ChongWah Ngo | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Snap-and-ask

Author(s) -

Wei Zhang,

Lei Pang,

ChongWah Ngo

Publication year - 2012

Publication title -

proceedings of the 30th acm international conference on multimedia

Language(s) - English

Resource type - Conference proceedings

DOI - 10.1145/2393347.2393432

Subject(s) - question answering , computer science , parsing , information retrieval , ranking (information retrieval) , consistency (knowledge bases) , matching (statistics) , artificial intelligence , landmark , natural language processing , mathematics , statistics

In real-life, it is easier to provide a visual cue when asking a question about a possibly unfamiliar topic, for example, asking the question, "Where was this crop circle found?". Providing an image of the instance is far more convenient than texting a verbose description of the visual properties, especially when the name of the query instance is not known. Nevertheless, having to identify the visual instance before processing the question and eventually returning the answer makes multimodal question-answering technically challenging. This paper addresses the problem of visual-to-text naming through the paradigm of answering-by-search in a two-stage computational framework, which is composed out of instance search (IS) and similar question ranking (QR). In IS, names of the instances are inferred from similar visual examples searched through a million-scale image dataset. For recalling instances of non-planar and non-rigid shapes, spatial configurations that emphasize topology consistency while allowing for local variations in matches have been incorporated. In QR, the candidate names of the instance are statistically identified from search results and directly utilized to retrieve similar questions from community-contributed QA (cQA) archives. By parsing questions into syntactic trees, a fuzzy matching between the inquirer's question and cQA questions is performed to locate answers and recommend related questions to the inquirer. The proposed framework is evaluated on a wide range of visual instances (e.g., fashion, art, food, pet, logo, and landmark) over various QA categories (e.g., factoid, definition, how-to, and opinion).

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research