'use_cache: false' reduces tokens/sec significantly

#10

by Astris - opened Aug 19, 2023

Aug 19, 2023

I saw a 3x reduction in tokens/sec with cache being disabled,compared to enabled. I don't know why it was disabled, but considering the difference it might be beneficial to have it enabled by default. I used the huggingface loader in text-generation-webui, and ran the model on a 3090.

robotwalrus

Sep 5, 2023

The config.json file has use_cache: True already set. When I loaded this up in textgen, it stayed set to true. Is there anything special about your setup?

Gryphe

Owner Sep 5, 2023

To clarify, I only fixed this yesterday. (I kept forgetting)

robotwalrus

Sep 5, 2023

Oh! I should have looked at my local copy when I commented, I see that my cache was set to false. Got a nice little speed increase, not 3x, but from 7it/s to 11it/s on a 4090. Thanks metaprotium, wouldn't have known unless you posted. And thanks for the model Gryphe, it's seriously awesome.

Astris changed discussion status to closed Sep 23, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment