z-logo
open-access-imgOpen Access
Describing Common Human Visual Actions in Images
Author(s) -
Matteo Ruggero Ronchi,
Pietro Perona
Publication year - 2015
Language(s) - English
Resource type - Conference proceedings
DOI - 10.5244/c.29.52
Subject(s) - computer science , object (grammar) , artificial intelligence , lexicon , coco , set (abstract data type) , action (physics) , subject (documents) , natural language processing , verb , semantics (computer science) , pattern recognition (psychology) , world wide web , physics , quantum mechanics , programming language
Which common human actions and interactions are recognizable in monocular still images? Which involve objects and/or other people? How many is a person performing at a time? We address these questions by exploring the actions and interactions that are detectable in the images of the MS COCO dataset. We make two main contributions. First, a list of 140 common ‘visual actions’, obtained by analyzing the largest online verb lexicon currently available for English (VerbNet) and human sentences used to describe images in MS COCO. Second, a complete set of annotations for those ‘visual actions’, composed of subject-object and associated verb, which we call COCO-a (a for ‘actions’). COCO-a is larger than existing action datasets in terms of number instances of actions, and is unique because it is data-driven, rather than experimenter-biased. Other unique features are that it is exhaustive, and that all subjects and objects are localized. A statistical analysis of the accuracy of our annotations and of each action, interaction and subject-object combination is provided.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom