z-logo
open-access-imgOpen Access
End-to-End, Single-Stream Temporal Action Detection in Untrimmed Videos
Author(s) -
Shyamal Buch,
Víctor Escorcia,
Bernard Ghanem,
Juan Carlos Niebles
Publication year - 2017
Language(s) - English
Resource type - Conference proceedings
DOI - 10.5244/c.31.93
Subject(s) - computer science , action (physics) , artificial intelligence , pattern recognition (psychology) , quantum mechanics , physics
In this work, we present a new intuitive, end-to-end approach for temporal action detection in untrimmed videos. We introduce our new architecture for Single-Stream Temporal Action Detection (SS-TAD), which effectively integrates joint action detection with its semantic sub-tasks in a single unifying end-to-end framework. We develop a method for training our deep recurrent architecture based on enforcing semantic constraints on intermediate modules that are gradually relaxed as learning progresses. We find that such a dynamic learning scheme enables SS-TAD to achieve higher overall detection performance, with fewer training epochs. By design, our single-pass network is very efficient and can operate at 701 frames per second, while simultaneously outperforming the state-of-the-art methods for temporal action detection on THUMOS’14.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom