A computational approach for progressive architecture shrinkage in action recognition | Zendy

Tomei Matteo | Zendy; Baraldi Lorenzo | Zendy; Fiameni Giuseppe | Zendy; Bronzin Simone | Zendy; Cucchiara Rita | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

A computational approach for progressive architecture shrinkage in action recognition

Author(s) -

Tomei Matteo,

Baraldi Lorenzo,

Fiameni Giuseppe,

Bronzin Simone,

Cucchiara Rita

Publication year - 2022

Publication title -

software: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.437

H-Index - 70

eISSN - 1097-024X

pISSN - 0038-0644

DOI - 10.1002/spe.3035

Subject(s) - computer science , memory footprint , supercomputer , computational complexity theory , artificial intelligence , computer engineering , computer architecture , parallel computing , distributed computing , algorithm , operating system

Efficiency plays a key role in video understanding modeling, and developing more efficient spatiotemporal deep networks is a key ingredient for enabling their usage in production scenarios. In this work, we propose a methodology for reducing the computational complexity of a video understanding backbone while limiting the drop in accuracy caused by architectural changes. Our approach, named, Progressive Architecture Shrinkage, applies a sequence of reduction operators to the hyperparameters of a network to reduce its computational footprint. The choice of the sequence of operations is automatically optimized in a coordinate‐descent schema, and the approach transfers knowledge from both the initial network and previous stages of the shrinking process by employing a Knowledge Distillation and an adaptive fine‐tuning strategy. As each iteration of the shrinking algorithm requires to train a large‐scale video understanding network, we perform experiments on MARCONI 100—a supercomputer equipped with an IBM Power9 architecture and Volta NVIDIA GPUs. Experimental evaluations are conducted using two backbones and three different action recognition benchmarks. We show that, through our approach, high accuracy levels can be maintained while reducing the number of multiply–adds operations by four times with respect to the original architectures. Code will be made available.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research