Combination of temporal‐channels correlation information and bilinear feature for action recognition | Zendy

Cai Jiahui | Zendy; Hu Jianguo | Zendy; Li Shiren | Zendy; Lin Jialing | Zendy; Wang Jun | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Combination of temporal‐channels correlation information and bilinear feature for action recognition

Author(s) -

Cai Jiahui,

Hu Jianguo,

Li Shiren,

Lin Jialing,

Wang Jun

Publication year - 2020

Publication title -

iet computer vision

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.38

H-Index - 37

eISSN - 1751-9640

pISSN - 1751-9632

DOI - 10.1049/iet-cvi.2020.0023

Subject(s) - pooling , bilinear interpolation , computer science , convolutional neural network , focus (optics) , feature (linguistics) , channel (broadcasting) , pattern recognition (psychology) , representation (politics) , artificial intelligence , domain (mathematical analysis) , feature learning , data mining , machine learning , computer vision , mathematics , telecommunications , politics , mathematical analysis , linguistics , philosophy , physics , law , political science , optics

In this study, the authors focus on improving the spatio–temporal representation ability of three‐dimensional (3D) convolutional neural networks (CNNs) in the video domain. They observe two unfavourable issues: (i) the convolutional filters only dedicate to learning local representation along input channels. Also they treat channel‐wise features equally, without emphasising the important features; (ii) traditional global average pooling layer only captures first‐order statistics, ignoring finer detail features useful for classification. To mitigate these problems, they proposed two modules to boost 3D CNNs’ performance, which are temporal‐channel correlation (TCC) and bilinear pooling module. The TCC module can capture the information of inter‐channel correlations over the temporal domain. Moreover, the TCC module generates channel‐wise dependencies, which can adaptively re‐weight the channel‐wise features. Therefore, the network can focus on learning important features. With regards to the bilinear pooling module, it can capture more complex second‐order statistics in deep features and generate a second‐order classification vector. We can get more accurate classification results by combining the first‐order and second‐order classification vector. Extensive experiments show that adding our proposed modules to I3D network could consistently improve the performance and outperform the state‐of‐the‐art methods. The code and models are available at https://github.com/caijh33/I3D_TCC_Bilinear .

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore