z-logo
open-access-imgOpen Access
Evaluating State-of-the-Art Visual Question Answering Models Ability to Answer Complex Counting Questions
Author(s) -
Krish Gangaraju,
Khaled Jedoui
Publication year - 2021
Publication title -
journal of student research
Language(s) - English
Resource type - Journals
ISSN - 2167-1907
DOI - 10.47611/jsrhs.v10i4.2446
Subject(s) - question answering , computer science , artificial intelligence , word (group theory) , natural language processing , deep learning , questions and answers , machine learning , mathematics , geometry
Visual Question Answering (VQA) is a relatively newer area of computer science involving computer vision, natural language processing, and deep learning. It has the ability to answer questions (currently in English) related to particular images that it is shown. Since the original VQA dataset was made publicly available in 2014, we’ve seen datasets such as the OK-VQA, Visual7W, and CLEVR that have all explored new concepts, various algorithms exceeding previous benchmarks, and methods to evaluate these models. However, to the best of my research, I have not seen any math or word problems being integrated into any of the VQA datasets. In this paper, I incorporate the four basic mathematical operations into the ‘counting’ questions of the CLEVR dataset and compare how different models fair against this modified dataset of 100,00 images and 2.4 million questions. The models we used achieved circa 50% validation accuracy within 4 epochs showing room for improvement. If VQA models can assimilate mathematics into its question understanding ability, then this can open new pathways for the future.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here