Spaces:
Configuration error
To QAT or not?
I'm curious since QAT stuff is coming out, your thoughts on it?
And do you think training off it instead would be worthwhile going forward?
Money.
The 31B tune of Serenity at two epoches was ~$40. Omega is probably going to be in the range of $100 if I do it on 31B.
QAT (from my very basic understanding) requires FFT training which is $.
It is not forbidden to train qat with a lora, the money problem will only arise if the lora is already trained, then yes, you will have to spend money on training again
It is not forbidden to train qat with a lora, the money problem will only arise if the lora is already trained, then yes, you will have to spend money on training again
I'll look into this more then for future tunes, but from what I've heard on the discord I frequent QAT tends to be dumb.
I'll need to actually try QAT and see for myself one of these days.
I'll probably tune the new 26B once Axolotl support has matured. 26B's aren't that much money to train.

Overall, it's a bit dumber than 12B - if I were measuring in "Gemma's," I'd say 9B. The claimed speed is 3.5 times higher than the autoregressive 26B-A4B (though no one complained about its speed anyway). This would be good for older PCs like mine (which don't have all those Cuda cores), but I'm not sure the diffusion model will perform decently without a GPU. At least, it takes me about 30 minutes to render a single 768x1152 image using the 6B model in Q8_0 with 8 denoising steps. Just curious to try the diffusion model in RP.
I'll look into this more then for future tunes, but from what I've heard on the discord I frequent QAT tends to be dumb.
QAT (from my very basic understanding) requires FFT training which is $.
The 31B tune of Serenity at two epoches was ~$40. Omega is probably going to be in the range of $100 if I do it on 31B.I'll probably tune the new 26B once Axolotl support has matured. 26B's aren't that much money to train.
Gotcha, i was asking purely on a viability, not considering cost. And if QAT suggests it may get dumb then probably best to avoid it for now.
but I'm not sure the diffusion model will perform decently without a GPU. At least, it takes me about 30 minutes to render a single 768x1152 image using the 6B model in Q8_0 with 8 denoising steps. Just curious to try the diffusion model in RP
With SDXL it tends to take 5 minutes running on a 3060.
Part of diffusion with LlaDa or the like they say it will have a confidence rating per token, so the low confidence ones are the ones that reroll, while higher confidence ones don't. So it should be faster... Plus an image generation i think the token is a 2x6, so you're probably looking at 74k tokens. If we assume the same it should be effectively ~40t/s on your config. Though quality of replies is still the bigger factor.
But that's a guess, my math could be totally wrong (768x1152/12 = 73728t. 73k/1800s = 40.96t/s. 800t/40= ~20 seconds for 800 tokens).
