z-logo
open-access-imgOpen Access
Visual-Syntactic Embedding for Video Captioning
Author(s) -
Jesus Perez-Martin,
Jorge Pérez,
Benjamín Bustos
Publication year - 2021
Language(s) - English
Resource type - Conference proceedings
DOI - 10.52591/lxai202106259
Subject(s) - closed captioning , computer science , natural language processing , artificial intelligence , context (archaeology) , encoder , representation (politics) , embedding , component (thermodynamics) , speech recognition , image (mathematics) , paleontology , politics , political science , law , biology , operating system , physics , thermodynamics
Video captioning is the task of predicting a semantic and syntactically correct sequence of words given some context video. The most successful methods for video captioning have a strong dependency on the effectiveness of semantic representations learned from visual models, but often produce syntactically incorrect sentences which harms their performance on standard datasets. We address this limitation by considering syntactic representation learning as an essential component of video captioning. We construct a visual-syntactic embedding by mapping into a common vector space a visual representation, that depends only on the video, with a syntactic representation that depends only on Part-of-Speech (POS) tagging structures of the video description. We integrate this joint representation into an encoder-decoder architecture that we call Visual-Semantic-Syntactic Aligned Network (SemSynAN), which guides the decoder (text generation stage) by aligning temporal compositions of visual, semantic, and syntactic representations. We tested our proposed architecture obtaining state-of-the-art results on two widely used video captioning datasets. This is a short version of a paper recently published at a Computer Vision Conference. The complete reference has been redacted to fulfill the double-blind restriction.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here