| --- |
| license: cc-by-nc-nd-4.0 |
| tags: |
| - Image |
| - Captionning |
| - RESNET-152 |
| - LSTM |
| --- |
| |
| # Introduction |
|
|
| This model is defined as proposed in the book "mastering pytorch". |
| It is based on CNN-encoder and a LSTM-decoder. |
|
|
| The CNN-encoder is based on a pretrained RESNET-152. The last layer of the resnet is replaced by a vector embedding layer of 256 elements. |
| The LSTM-decoder use an input of 256, a hidden layer of 512, and uses the vocabulary size. |
|
|
| The model has been trained as a pure learning exercise, and so the model performances remain relatively mean. |
|
|
| # Training procedure |
|
|
| For the sake of the exercise, the model has been trained for only 5 epochs. |
|
|
| It has been trained on the COCO dataset. |
|
|
| # Support |
|
|
| If you like my work, feel free to support me here: |
| [buymeacoffee.com/selfmaker](https://buymeacoffee.com/selfmaker) |