Problem: Image Captioning
Given an image, our goal is to generate a caption.
- Input: Image
- Output: Caption for image
Solution
For this problem, we use will use InceptionV3 (which is pre-trained on Imagenet) to classify each image. We will extract features from the last convolutional layer. The RNN (here GRU) attends over the image to predict the next word.

Experimental Results







