Multi‐mode neural network for human action recognition | Zendy

Zhao Haohua | Zendy; Xue Weichen | Zendy; Li Xiaobo | Zendy; Gu Zhangxuan | Zendy; Niu Li | Zendy; Zhang Liqing | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Multi‐mode neural network for human action recognition

Author(s) -

Zhao Haohua,

Xue Weichen,

Li Xiaobo,

Gu Zhangxuan,

Niu Li,

Zhang Liqing

Publication year - 2020

Publication title -

iet computer vision

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.38

H-Index - 37

eISSN - 1751-9640

pISSN - 1751-9632

DOI - 10.1049/iet-cvi.2019.0761

Subject(s) - computer science , artificial intelligence , convolutional neural network , frame (networking) , feature (linguistics) , pattern recognition (psychology) , feature extraction , artificial neural network , deep learning , mode (computer interface) , recurrent neural network , representation (politics) , task (project management) , telecommunications , philosophy , linguistics , management , politics , political science , law , economics , operating system

Video data are of two different intrinsic modes, in‐frame and temporal. It is beneficial to incorporate static in‐frame features to acquire dynamic features for video applications. However, some existing methods such as recurrent neural networks do not have a good performance, and some other such as 3D convolutional neural networks (CNNs) are both memory consuming and time consuming. This study proposes an effective framework that takes the advantage of deep learning on the static image feature extraction to tackle the video data. After extracting in‐frame feature vectors using a pretrained deep network, the authors integrate them and form a multi‐mode feature matrix, which preserves the multi‐mode structure and high‐level representation. They propose two models for follow‐up classification. The authors first introduce a temporal CNN, which directly feeds the multi‐mode feature matrix into a CNN. However, they show that characteristics of the multi‐mode features differ significantly in distinct modes. The authors therefore further propose the multi‐mode neural network (MMNN), in which different modes deploy different types of layers. They evaluate their algorithm with the task of human action recognition. The experimental results show that the MMNN achieves a much better performance than the existing long short‐term memory‐based methods and consumes far fewer resources than the existing 3D end‐to‐end models.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore