Problem: Image Captioning
Given an image, our goal is to generate a caption.
- Input: Image
- Output: Caption for image
Solution
For this problem, we use will use InceptionV3 (which is pre-trained on Imagenet) to classify each image. We will extract features from the last convolutional layer. The RNN (here GRU) attends over the image to predict the next word.