Learning Compact Spatio-Temporal Features for Fast Content based Video Retrieval | Zendy

Vidit Kumar | Zendy; Vikas Tripathi | Zendy; Bhaskar Pant | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Learning Compact Spatio-Temporal Features for Fast Content based Video Retrieval

Author(s) -

Vidit Kumar,

Vikas Tripathi,

Bhaskar Pant

Publication year - 2019

Publication title -

international journal of innovative technology and exploring engineering

Language(s) - English

Resource type - Journals

ISSN - 2278-3075

DOI - 10.35940/ijitee.b7847.129219

Subject(s) - computer science , hash function , search engine indexing , metadata , video tracking , upload , key (lock) , information retrieval , video browsing , artificial intelligence , code (set theory) , deep learning , image retrieval , exploit , key frame , frame (networking) , multimedia , video processing , image (mathematics) , world wide web , telecommunications , computer security , set (abstract data type) , programming language

Videos are recorded and uploaded daily to the sites like YouTube, Facebook etc. from devices such as mobile phones and digital cameras with less or without metadata (semantic tags) associated with it. This makes extremely difficult to retrieve similar videos based on this metadata without using content based semantic search. Content based video retrieval is problem of retrieving most similar videos to a given query video and has wide range of applications such as video browsing, content filtering, video indexing, etc. Traditional video level features based on key frame level hand engineered features which does not exploit rich dynamics present in the video. In this paper we propose a fast content based video retrieval framework using compact spatio-temporal features learned by deep learning. Specifically, deep CNN along with LSTM is deploy to learn spatio-temporal representations of video. For fast retrieval, binary code is generated by hashing learning component in the framework. For fast and effective learning of hash code proposed framework is trained in two stages. First stage learns the video dynamics and in second stage compact code is learn using learned video’s temporal variation from the first stage. UCF101 dataset is used to test the proposed method and results compared by other hashing methods. Results show that our approach is able to improve the performance over existing methods.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore