Automated image captioning with deep neural networks

Abstract
Generating natural language descriptions of the content of an image automatically is a complex task. Though it comes naturally to humans, it is not the same when making a machine do the same. But undoubtedly, achieving this feature would remarkably change how machines interact with us. Recent advancement in object recognition from images has led to the model of captioning images based on the relation between the objects in it. In this research project, we are demonstrating the latest technology and algorithms for automated caption generation of images using deep neural networks. This model of generating a caption follows an encoder-decoder strategy inspired by the language-translation model based on Recurrent Neural Networks (RNN). The language translation model uses RNN for both encoding and decoding, whereas this model uses a Convolutional Neural Networks (CNN) for encoding and an RNN for decoding. This combination of neural networks is more suited for generating a caption from an image. The model takes in an image as input and produces an ordered sequence of words, which is the caption.
Funding Information
  • International Islamic University Malaysia (RIGS16-346-0510)