z-logo
Premium
An enhanced 3DCNN‐ConvLSTM for spatiotemporal multimedia data analysis
Author(s) -
Wang Tian,
Li Jiakun,
Zhang Mengyi,
Zhu Aichun,
Snoussi Hichem,
Choi Chang
Publication year - 2019
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.5302
Subject(s) - pooling , computer science , dropout (neural networks) , artificial intelligence , normalization (sociology) , action recognition , task (project management) , layer (electronics) , pattern recognition (psychology) , feature (linguistics) , machine learning , linguistics , chemistry , philosophy , management , organic chemistry , sociology , anthropology , economics , class (philosophy)
Summary At present, human action recognition is a challenging and complex task in the field of computer vision. The combination of CNN and RNN is a common and effective network structure for this task. Especially, we use 3DCNN in CNN part and ConvLSTM in RNN part. We divide the video into multiple temporal segments by average and compress each segment into one feature map by pooling layer. Adding the pooling layer, dropout layer, and batch normalization layer into ConvLSTM is our groundbreaking work. We test our model on KTH, UCF‐11, and HMDB51 datasets and achieve a high accuracy of action recognition.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here