Spaces:

ReadyArt
/

README

Configuration error

App Files Files Community

To QAT or not?

#10

by yano2mch - opened 18 days ago

Discussion

yano2mch

18 days ago

I'm curious since QAT stuff is coming out, your thoughts on it?

And do you think training off it instead would be worthwhile going forward?

FrenzyBiscuit

Ready.Art org 18 days ago

Money.

FrenzyBiscuit

Ready.Art org 18 days ago

The 31B tune of Serenity at two epoches was ~$40. Omega is probably going to be in the range of $100 if I do it on 31B.

QAT (from my very basic understanding) requires FFT training which is $.

Disya

18 days ago

It is not forbidden to train qat with a lora, the money problem will only arise if the lora is already trained, then yes, you will have to spend money on training again

Theory-of-mind

18 days ago

google/diffusiongemma-26B-A4B-it How about this? XD

FrenzyBiscuit

Ready.Art org 18 days ago

It is not forbidden to train qat with a lora, the money problem will only arise if the lora is already trained, then yes, you will have to spend money on training again

I'll look into this more then for future tunes, but from what I've heard on the discord I frequent QAT tends to be dumb.

I'll need to actually try QAT and see for myself one of these days.

FrenzyBiscuit

Ready.Art org 18 days ago

google/diffusiongemma-26B-A4B-it How about this? XD

FrenzyBiscuit

Ready.Art org 18 days ago

I'll probably tune the new 26B once Axolotl support has matured. 26B's aren't that much money to train.

Theory-of-mind

18 days ago

Overall, it's a bit dumber than 12B - if I were measuring in "Gemma's," I'd say 9B. The claimed speed is 3.5 times higher than the autoregressive 26B-A4B (though no one complained about its speed anyway). This would be good for older PCs like mine (which don't have all those Cuda cores), but I'm not sure the diffusion model will perform decently without a GPU. At least, it takes me about 30 minutes to render a single 768x1152 image using the 6B model in Q8_0 with 8 denoising steps. Just curious to try the diffusion model in RP.

yano2mch

18 days ago

I'll look into this more then for future tunes, but from what I've heard on the discord I frequent QAT tends to be dumb.

QAT (from my very basic understanding) requires FFT training which is $.
The 31B tune of Serenity at two epoches was ~$40. Omega is probably going to be in the range of $100 if I do it on 31B.

I'll probably tune the new 26B once Axolotl support has matured. 26B's aren't that much money to train.

Gotcha, i was asking purely on a viability, not considering cost. And if QAT suggests it may get dumb then probably best to avoid it for now.

but I'm not sure the diffusion model will perform decently without a GPU. At least, it takes me about 30 minutes to render a single 768x1152 image using the 6B model in Q8_0 with 8 denoising steps. Just curious to try the diffusion model in RP

With SDXL it tends to take 5 minutes running on a 3060.

Part of diffusion with LlaDa or the like they say it will have a confidence rating per token, so the low confidence ones are the ones that reroll, while higher confidence ones don't. So it should be faster... Plus an image generation i think the token is a 2x6, so you're probably looking at 74k tokens. If we assume the same it should be effectively ~40t/s on your config. Though quality of replies is still the bigger factor.

But that's a guess, my math could be totally wrong (768x1152/12 = 73728t. 73k/1800s = 40.96t/s. 800t/40= ~20 seconds for 800 tokens).

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment