Open Access
Measuring Machine Intelligence Through Visual Question Answering
Author(s) -
Zitnick C. Lawrence,
Agrawal Aishwarya,
Antol Stanislaw,
Mitchell Margaret,
Batra Dhruv,
Parikh Devi
Publication year - 2016
Publication title -
ai magazine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.597
H-Index - 79
eISSN - 2371-9621
pISSN - 0738-4602
DOI - 10.1609/aimag.v37i1.2647
Subject(s) - closed captioning , computer science , task (project management) , question answering , artificial intelligence , set (abstract data type) , machine learning , human intelligence , human–machine system , natural language processing , human–computer interaction , image (mathematics) , engineering , systems engineering , programming language
As machines have become more intelligent, there has been a renewed interest in methods for measuring their intelligence. A common approach is to propose tasks for which a human excels, but one that machines find difficult. However, an ideal task should also be easy to evaluate and not be easily gameable. We begin with a case study exploring the recently popular task of image captioning and its limitations as a task for measuring machine intelligence. An alternative and more promising task is visual question answering, which tests a machine's ability to reason about language and vision. We describe a data set, unprecedented in size and created for the task, that contains more than 760,000 human‐generated questions about images. Using around 10 million human‐generated answers, researchers can easily evaluate the machines.