
Improve Visual Question Answering Based On Text Feature Extraction
Author(s) -
Jing Wang,
Dong Yang
Publication year - 2021
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1856/1/012025
Subject(s) - computer science , question answering , feature extraction , basis (linear algebra) , image (mathematics) , artificial intelligence , set (abstract data type) , feature (linguistics) , pattern recognition (psychology) , data mining , mathematics , linguistics , philosophy , geometry , programming language
VQA requires the machine to be able to answer questions related to the image based on the image. VQA is mainly divided into four modules: text extraction module, image extraction module, text-image feature fusion module and answer prediction module. In this paper, the most mainstream VQA2.0[1] data set is selected for experiment, and the text extraction module is improved on the basis of a baseline model. The Glove vector is used to preprocess the problem text data, and the GRU network replaces the traditional LSTM network encoding and processing text vector. Experimental results show that the improvement of the text extraction module improves by 2.42% on the basis of the original model, and the GRU network can also accelerate the training speed.