z-logo
open-access-imgOpen Access
A Survey on Image Encoders and Language Models for Image Captioning
Author(s) -
Himanshu Sharma
Publication year - 2021
Publication title -
iop conference series. materials science and engineering
Language(s) - English
Resource type - Journals
eISSN - 1757-899X
pISSN - 1757-8981
DOI - 10.1088/1757-899x/1116/1/012118
Subject(s) - closed captioning , computer science , image (mathematics) , artificial intelligence , natural language , encoder , convolutional neural network , encoding (memory) , sentence , natural language processing , computer vision , operating system
Generating a natural language explanation for a given image is known as image captioning. An image captioning method aims to determine the significant objects present in an image together with the relationship between these objects. Also, the model has the capability to describe an image by a syntactically and semantically correct sentence. For image encoding, convolutional neural network {CNN] is applied and for producing natural language descriptions for a given image, language models (RNN & LSTM etc.) are employed. In this paper, the image encoders and language models used by the state-of-the-art image captioning models is discussed.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here