Research Library

open-access-imgOpen AccessKnowledge-enhanced Multi-perspective Video Representation Learning for Scene Recognition
Author(s)
Xuzheng Yu,
Chen Jiang,
Wei Zhang,
Tian Gan,
Linlin Chao,
Jianan Zhao,
Yuan Cheng,
Qingpei Guo,
Wei Chu
Publication year2024
With the explosive growth of video data in real-world applications, acomprehensive representation of videos becomes increasingly important. In thispaper, we address the problem of video scene recognition, whose goal is tolearn a high-level video representation to classify scenes in videos. Due tothe diversity and complexity of video contents in realistic scenarios, thistask remains a challenge. Most existing works identify scenes for videos onlyfrom visual or textual information in a temporal perspective, ignoring thevaluable information hidden in single frames, while several earlier studiesonly recognize scenes for separate images in a non-temporal perspective. Weargue that these two perspectives are both meaningful for this task andcomplementary to each other, meanwhile, externally introduced knowledge canalso promote the comprehension of videos. We propose a novel two-streamframework to model video representations from multiple perspectives, i.e.temporal and non-temporal perspectives, and integrate the two perspectives inan end-to-end manner by self-distillation. Besides, we design aknowledge-enhanced feature fusion and label prediction method that contributesto naturally introducing knowledge into the task of video scene recognition.Experiments conducted on a real-world dataset demonstrate the effectiveness ofour proposed method.
Language(s)English

Seeing content that should not be on Zendy? Contact us.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here