The model takes the image as an input and goes through the CNN model for feature extraction and then passes through the LSTM model for caption generation.

The model takes the image as an input and goes through the CNN model for feature extraction and then passes through the LSTM model for caption generation.
brown dog is running through theĀ grass
The model will train over the training data of 30000 samples.