
Generate Detailed Captions of an Image using Deep Learning
Author(s) -
Khan Shayaan Shakeel,
Masalawala Murtaza Shabbir,
Qazi Faizan Ahmed,
Nafisa Mapari
Publication year - 2022
Publication title -
international journal for research in applied science and engineering technology
Language(s) - English
Resource type - Journals
ISSN - 2321-9653
DOI - 10.22214/ijraset.2022.41309
Subject(s) - computer science , artificial intelligence , image (mathematics) , extension (predicate logic) , feature (linguistics) , deep learning , computer vision , pattern recognition (psychology) , feature extraction , philosophy , linguistics , programming language
This paper shows the implementation of image caption generation using deep learning algorithm. The project is one of the primary example of computer vision. The main aim of computer vision is scene undertanding. The algorithms used in this model are CNN and LSTM. This model is an extension of the model based on CNN - RNN Model which suffers from the drawback of vanishing gradient. Xception model is used for image feature extraction and is a CNN model that is trained using ImageNet dataset. Extracted features from the Xception model is fed as the input to the LSTM model which in turn generates the caption for the image. The dataset used for training and testing is Flickr_8k dataset.