z-logo
open-access-imgOpen Access
Multi‐mode neural network for human action recognition
Author(s) -
Zhao Haohua,
Xue Weichen,
Li Xiaobo,
Gu Zhangxuan,
Niu Li,
Zhang Liqing
Publication year - 2020
Publication title -
iet computer vision
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.38
H-Index - 37
eISSN - 1751-9640
pISSN - 1751-9632
DOI - 10.1049/iet-cvi.2019.0761
Subject(s) - computer science , artificial intelligence , convolutional neural network , frame (networking) , feature (linguistics) , pattern recognition (psychology) , feature extraction , artificial neural network , deep learning , mode (computer interface) , recurrent neural network , representation (politics) , task (project management) , telecommunications , philosophy , linguistics , management , politics , political science , law , economics , operating system
Video data are of two different intrinsic modes, in‐frame and temporal. It is beneficial to incorporate static in‐frame features to acquire dynamic features for video applications. However, some existing methods such as recurrent neural networks do not have a good performance, and some other such as 3D convolutional neural networks (CNNs) are both memory consuming and time consuming. This study proposes an effective framework that takes the advantage of deep learning on the static image feature extraction to tackle the video data. After extracting in‐frame feature vectors using a pretrained deep network, the authors integrate them and form a multi‐mode feature matrix, which preserves the multi‐mode structure and high‐level representation. They propose two models for follow‐up classification. The authors first introduce a temporal CNN, which directly feeds the multi‐mode feature matrix into a CNN. However, they show that characteristics of the multi‐mode features differ significantly in distinct modes. The authors therefore further propose the multi‐mode neural network (MMNN), in which different modes deploy different types of layers. They evaluate their algorithm with the task of human action recognition. The experimental results show that the MMNN achieves a much better performance than the existing long short‐term memory‐based methods and consumes far fewer resources than the existing 3D end‐to‐end models.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here