z-logo
open-access-imgOpen Access
Hierarchical Attention Networks for Multimodal Machine Learning
Author(s) -
Haotian Liang,
Zhanqing Wang
Publication year - 2022
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/2218/1/012020
Subject(s) - computer science , task (project management) , fuse (electrical) , artificial intelligence , question answering , modal , machine learning , mechanism (biology) , attention network , natural language processing , philosophy , chemistry , epistemology , polymer chemistry , electrical engineering , engineering , management , economics
The Visual Question Answering (VQA) task is to infer the correct answer to a free-form question based on the given image. This task is challenging because it requires model handling both visual and textual information. Most successful attempts on VQA task have been achieved by using attention mechanism which can capture inter-modal and intra-modal dependencies. In this paper, we raise a new attention-based model to solve VQA. We use question information to guide model concentrate on special regions and attribute and hierarchically reason the answer. We also propose multi-modal fusion strategy based on co-attention method to fuse both visual and textual information. Under the same experimental conditions, extensive experiments on VQA-v2.0 dataset illustrate our method performance exceeds the performance of some state-of-the-art methods of the same experimental conditions.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here