A Convolutional Temporal Encoder for Video Caption Generation | Zendy

Qingle Huang | Zendy; Zicheng Liao | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

A Convolutional Temporal Encoder for Video Caption Generation

Author(s) -

Qingle Huang,

Zicheng Liao

Publication year - 2017

Language(s) - English

Resource type - Conference proceedings

DOI - 10.5244/c.31.126

Subject(s) - computer science , encoder , convolutional code , artificial intelligence , decoding methods , telecommunications , operating system

We propose a convolutional temporal encoding network for video sequence embedding and caption generation. The mainstream video captioning work is based on recurrent encoder of various forms (e.g. LSTMs and hierarchical encoders). In this work, a multi-layer convolutional neural network encoder is proposed. At the core of this encoder is a gated linear unit (GLU) that performs a linear convolutional transformation of input with a nonlinear gating, which has demonstrated superior performance in natural language modeling. Our model is built on top of this unit for video encoding and integrates several up-to-date tricks including batch normalization, skip connection and soft attention. Experiment on two large-scale benchmark datasets (MSAD and M-VAD) generates strong results and demonstrates the effectiveness of our model.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research