
Enhancing Visual Question Answering for Multiple Choice Questions
Author(s) -
Rashi Goel,
Harsh Nandwani,
Eshaan Shah,
Ashalatha Nayak,
Archana Praveen Kumar
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3572529
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
The proposed paper examines enhancements in Visual Question Answering (VQA) by systematically tuning hyperparameters and utilizing advanced image and text encoders. The study particularly explores the adaptation of these models to Multiple-Choice Question (MCQ) formats, aiming to refine their accuracy and applicability. MCQs consist of a question stem along with a set of options, from which the correct answer, the key needs to be identified among the distractors. Using MCQs provides the model with some context of the correct answer, improving its performance over a simple multiclass classification task. The research showcases the effectiveness of precise hyperparameter adjustments in improving the performance of VQA systems, through comparative analysis of varied sets of hyperparameters, highlighting their improved reasoning capabilities across various datasets, including samples from real world images and academic questions. This demonstrates the potential of VQA models for robust application in both educational and practical scenarios.