z-logo
open-access-imgOpen Access
A Convolutional Temporal Encoder for Video Caption Generation
Author(s) -
Qingle Huang,
Zicheng Liao
Publication year - 2017
Language(s) - English
Resource type - Conference proceedings
DOI - 10.5244/c.31.126
Subject(s) - computer science , encoder , convolutional code , artificial intelligence , decoding methods , telecommunications , operating system
We propose a convolutional temporal encoding network for video sequence embedding and caption generation. The mainstream video captioning work is based on recurrent encoder of various forms (e.g. LSTMs and hierarchical encoders). In this work, a multi-layer convolutional neural network encoder is proposed. At the core of this encoder is a gated linear unit (GLU) that performs a linear convolutional transformation of input with a nonlinear gating, which has demonstrated superior performance in natural language modeling. Our model is built on top of this unit for video encoding and integrates several up-to-date tricks including batch normalization, skip connection and soft attention. Experiment on two large-scale benchmark datasets (MSAD and M-VAD) generates strong results and demonstrates the effectiveness of our model.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom