Video data mining using configurations of viewpoint invariant regions
Author(s) -
Josef Sivic,
Andrew Zisserman
Publication year - 2004
Publication title -
proceedings of the 2004 ieee computer society conference on computer vision and pattern recognition, 2004. cvpr 2004.
Language(s) - English
DOI - 10.1109/cvpr.2004.261
We describe a method for obtaining the principal objects, characters and scenes in a video by measuring the reoccurrence of spatial configurations of viewpoint invariant features. We investigate two aspects of the problem: the scale of the configurations, and the similarity requirements for clustering configurations. The problem is challenging firstly because an object can undergo substantial changes in imaged appearance throughout a video (due to viewpoint and illumination change, and partial occlusion), and secondly because configurations are detected imperfectly, so that inexact patterns must be matched. The novelty of the method is that viewpoint invariant features are used to form the configurations, and that efficient methods from the text analysis literature are employed to reduce the matching complexity. Examples of 'mined' objects are shown for a feature length film and a sitcom.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom