
Cascade recurrent neural network for image caption generation
Author(s) -
Wu Jie,
Hu Haifeng
Publication year - 2017
Publication title -
electronics letters
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.375
H-Index - 146
ISSN - 1350-911X
DOI - 10.1049/el.2017.3159
Subject(s) - recurrent neural network , computer science , cascade , artificial intelligence , image (mathematics) , artificial neural network , closed captioning , word (group theory) , deep learning , embedding , mathematics , chemistry , geometry , chromatography
A new cascade recurrent neural network (CRNN) for image caption generation is proposed. Different from the classical multimodal recurrent neural network, which only uses a single network for extracting unidirectional syntactic features, CRNN adopts a cascade network for learning visual‐language interactions from forward and backward directions, which can exploit the deep semantic contexts contained in the image. In the proposed framework, two embedding layers for dense word expression are constructed. A new stacked Gated Recurrent Unit is designed for learning image‐word mappings. The effectiveness of the CRNN model is verified with adopting the commonly used MSCOCO datasets, where the results indicate CRNN can achieve better performance compared with the state‐of‐the‐art image captioning methods such as Google NIC, multimodal recurrent neural network and so on.