z-logo
open-access-imgOpen Access
Compositional action recognition with multi-view feature fusion
Author(s) -
Zhicheng Zhao,
Yingan Liu,
Lei Ma
Publication year - 2022
Publication title -
plos one
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.99
H-Index - 332
ISSN - 1932-6203
DOI - 10.1371/journal.pone.0266259
Subject(s) - computer science , artificial intelligence , action (physics) , feature (linguistics) , exploit , lemma (botany) , fusion , action recognition , modality (human–computer interaction) , pattern recognition (psychology) , machine learning , class (philosophy) , ecology , linguistics , philosophy , physics , computer security , poaceae , quantum mechanics , biology
Most action recognition tasks now treat the activity as a single event in a video clip. Recently, the benefits of representing activities as a combination of verbs and nouns for action recognition have shown to be effective in improving action understanding, allowing us to capture such representations. However, there is still a lack of research on representational learning using cross-view or cross-modality information. To exploit the complementary information between multiple views, we propose a feature fusion framework, and our framework is divided into two steps: extraction of appearance features and fusion of multi-view features. We validate our approach on two action recognition datasets, IKEA ASM and LEMMA. We demonstrate that multi-view fusion can effectively generalize across appearances and identify previously unseen actions of interacting objects, surpassing current state-of-the-art methods. In particular, on the IKEA ASM dataset, the performance of the multi-view fusion approach improves 18.1% over the performance of the single-view approach on top-1.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here