z-logo
open-access-imgOpen Access
Visual Question Answering Based on Question Attention Model
Author(s) -
Jianing Zhang,
Zhaochang Wu,
Huajie Zhang,
Yunfang Chen
Publication year - 2020
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1624/2/022022
Subject(s) - question answering , computer science , artificial intelligence , convolutional neural network , image (mathematics) , focus (optics) , field (mathematics) , natural language processing , deep learning , computation , machine learning , pattern recognition (psychology) , algorithm , mathematics , physics , pure mathematics , optics
Visual Question Answer (VQA), the natural language question of Visual images, has become popular in the field of artificial intelligence. At present, most of the VQA models extract the whole image features, which consume a large amount of computation and have a complex structure. In this paper, we propose a VQA method based on question attention model. Firstly, the Convolutional Neural Networks (CNN) is used to extract image features from the input images, and the question text is processed by the Long Short-Term Memory (LSTM). Then, we design a question attention module to let the learning algorithm focus on the most relevant features of the input text. According to question features, our method utilities the attention module to add the corresponding weights to the image features and extract the meaningful information for the generation of answer sequence words. Our method performed significantly better than the LSTMQ+I model on the MS COCO visual question answer (VQA) dataset with an accuracy improvement of 2%.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here