selfmaker
/

image_caption

Model card Files Files and versions

image_caption / README.md

selfmaker's picture

Update README.md

38f1c86 verified about 14 hours ago

|

history blame contribute delete

822 Bytes

	---
	license: cc-by-nc-nd-4.0
	tags:
	- Image
	- Captionning
	- RESNET-152
	- LSTM
	---

	# Introduction

	This model is defined as proposed in the book "mastering pytorch".
	It is based on CNN-encoder and a LSTM-decoder.

	The CNN-encoder is based on a pretrained RESNET-152. The last layer of the resnet is replaced by a vector embedding layer of 256 elements.
	The LSTM-decoder use an input of 256, a hidden layer of 512, and uses the vocabulary size.

	The model has been trained as a pure learning exercise, and so the model performances remain relatively mean.

	# Training procedure

	For the sake of the exercise, the model has been trained for only 5 epochs.

	It has been trained on the COCO dataset.

	# Support

	If you like my work, feel free to support me here:
	[buymeacoffee.com/selfmaker](https://buymeacoffee.com/selfmaker)