Vocab_size of the model configuration is incorrect

#18

by robkirk - opened Aug 31, 2022

Aug 31, 2022

•

edited Aug 31, 2022

In the model configuration for this (and other opt models) the vocab_size is 50272, but the tokenizer has vocab size 50265, which matches the original vocabulary here. and the one on huggingface here. Could this be updated somehow (although I realise that could mess with checkpoints etc.)?

There's this issue on the transformers github referencing the samething.

patrickvonplaten

Aug 31, 2022

Hey @robkirk ,

Good question! I think you can find the answer here: https://github.com/huggingface/transformers/issues/17431#issuecomment-1224231170 (it was on another GitHub issue)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment