Fp32 vs fp16

#12

by wiccanmind - opened Nov 23, 2023

Nov 23, 2023

Thank you for contributing to this excellent model.
I have a question, the model is trained using float32 data type, but due to resource constraints, I am performing inference with fp16. Does this significantly impact the performance of the model?
Currently, I find it not performing as well as Orca 1 when inferring with fp16.

ari9dam

Nov 27, 2023

The model is trained with bfloat16. With fp16 inference you might see a loss, but overall that affects both Orca 1 and Orca 2. You can see the inference code here: https://huggingface.co/spaces/ari9dam/Orca-2-13B

(imp : use slow version of the tokenizer)

wiccanmind

Nov 28, 2023

The model is trained with bfloat16. With fp16 inference you might see a loss, but overall that affects both Orca 1 and Orca 2. You can see the inference code here: https://huggingface.co/spaces/ari9dam/Orca-2-13B

(imp : use slow version of the tokenizer)

Thank you very much for your response.
As I see in the config.json file, Orca 2 used "torch_dtype": "float32", in the other hand, Orca 1 used "torch_dtype": "bfloat16". Adding one more thing, the total weight file size of Orca 1 is 26GB, while that of Orca 2 is 53GB. It implies that Orca 2 is storing weights in a data type that is twice the size of Orca 1. So I still do not quite understand your statement 'The model is trained with bfloat16.'.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment