Improve Visual Question Answering Based On Text Feature Extraction | Zendy

Jing Wang | Zendy; Dong Yang | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Improve Visual Question Answering Based On Text Feature Extraction

Author(s) -

Jing Wang,

Dong Yang

Publication year - 2021

Publication title -

journal of physics. conference series

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.21

H-Index - 85

eISSN - 1742-6596

pISSN - 1742-6588

DOI - 10.1088/1742-6596/1856/1/012025

Subject(s) - computer science , question answering , feature extraction , basis (linear algebra) , image (mathematics) , artificial intelligence , set (abstract data type) , feature (linguistics) , pattern recognition (psychology) , data mining , mathematics , linguistics , philosophy , geometry , programming language

VQA requires the machine to be able to answer questions related to the image based on the image. VQA is mainly divided into four modules: text extraction module, image extraction module, text-image feature fusion module and answer prediction module. In this paper, the most mainstream VQA2.0[1] data set is selected for experiment, and the text extraction module is improved on the basis of a baseline model. The Glove vector is used to preprocess the problem text data, and the GRU network replaces the traditional LSTM network encoding and processing text vector. Experimental results show that the improvement of the text extraction module improves by 2.42% on the basis of the original model, and the GRU network can also accelerate the training speed.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore