Compositional models for VQA: Can neural module networks really count? | Zendy

Gabriela Sejnova | Zendy; Michael Tesař | Zendy; Michal Vavrečka | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Compositional models for VQA: Can neural module networks really count?

Author(s) -

Gabriela Sejnova,

Michael Tesař,

Michal Vavrečka

Publication year - 2018

Publication title -

procedia computer science

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.334

H-Index - 76

ISSN - 1877-0509

DOI - 10.1016/j.procs.2018.11.110

Subject(s) - computer science , chaining , artificial intelligence , question answering , humanoid robot , robot , task (project management) , visual reasoning , architecture , artificial neural network , component (thermodynamics) , machine learning , thermodynamics , psychology , art , physics , management , economics , visual arts , psychotherapist

Large neural networks trained in an end-to-end fashion usually fail to generalize over novel inputs which were not included in the training data. In contrast, biologically-inspired compositional models offer a more robust solution due to adaptive chaining of logical operations performed by specialized modules. In this paper, we present an implementation of a cognitive architecture based on the End-to-End Module Networks (N2NMNs) model [9] in the humanoid robot Pepper. The architecture is focused on the Visual Question Answering task (VQA), in which the robot answers questions regarding the seen image in natural language. We trained the system on the synthetic CLEVR dataset [10] and tested it on both synthetic images and real-world situations with CLEVR-like objects. We compare between the results and discuss the decrease of accuracy in real-world situations. Furthermore, we propose a new evaluation method, in which we test whether the model’s results for counting objects in each category is consistent with the overall number of seen objects. In summary, our results show that the current visual reasoning models are still far from being applicable in everyday life.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research