Object sequences: encoding categorical and spatial information for a yes/no visual question answering task | Zendy

Garg Shivam | Zendy; Srivastava Rajeev | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Object sequences: encoding categorical and spatial information for a yes/no visual question answering task

Author(s) -

Garg Shivam,

Srivastava Rajeev

Publication year - 2018

Publication title -

iet computer vision

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.38

H-Index - 37

eISSN - 1751-9640

pISSN - 1751-9632

DOI - 10.1049/iet-cvi.2018.5226

Subject(s) - computer science , encoding (memory) , artificial intelligence , object (grammar) , question answering , categorical variable , task (project management) , natural language processing , embedding , block (permutation group theory) , pattern recognition (psychology) , information retrieval , machine learning , geometry , mathematics , management , economics

The task of visual question answering (VQA) has gained wide popularity in recent times. Effectively solving the VQA task requires the understanding of both the visual content in the image and the language information associated with the text‐based question. In this study, the authors propose a novel method of encoding the visual information (categorical and spatial object information) of all the objects present in the image into a sequential format, which is called an object sequence. These object sequences can then be suitably processed by a neural network. They experiment with multiple techniques for obtaining a joint embedding from the visual features (in the form of object sequences) and language‐based features obtained from the question. They also provide a detailed analysis on the performance of a neural network architecture using object sequences, on the Oracle task of GuessWhat dataset (a Yes / No VQA task) and benchmark it against the baseline.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research