Instructions to use Patil/krea-turbo-svdquant with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use Patil/krea-turbo-svdquant with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("Patil/krea-turbo-svdquant", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Draw Things
- DiffusionBee
Krea Turbo SVDQuant
Transformer-only SVDQuant checkpoint for krea/Krea-2-Turbo, created for low-VRAM Krea2 inference.
This repo contains only the quantized transformer weights/config. You still load the base Krea2 pipeline from Hugging Face, then replace pipe.transformer with this SVDQuant transformer.
Files
svdquant_config.json
transformer_svdquant.safetensors
README.md
Checkpoint summary:
quantized transformer layers: 224
calibrated: true
SVD ranks: attention=64, MLP=128
residual: groupwise INT4, group_size=128
checkpoint size: ~6.5GB
Aesthetic samples
Generated with this checkpoint using the recommended low-VRAM 768px command.
Prompts:
a serene glasshouse cafe at golden hour, rain on windows, soft cinematic lighting, lush plants, pastel colors, aesthetic editorial photography, ultra detaileda dreamy coastal bedroom with linen curtains flowing in ocean breeze, warm sunset, minimalist interior, film grain, aesthetic lifestyle photographya cinematic portrait of a woman in a lavender field at dusk, soft backlight, shallow depth of field, ethereal fashion editorial, beautiful color gradinga cozy neon bookstore at night, reflections on wet street, cinematic bokeh, warm interior glow, aesthetic urban photography, ultra detailed
Install
git clone https://github.com/Tanmaypatil123/krea2-svdquant.git
cd krea2-svdquant
python -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -U torch torchvision --index-url https://download.pytorch.org/whl/cu128
pip install -U "diffusers @ git+https://github.com/huggingface/diffusers.git" transformers accelerate safetensors huggingface_hub hf_xet sentencepiece protobuf triton
pip install -e .
On RunPod / CUDA images that already include PyTorch, prefer:
python -m venv --system-site-packages .venv
source .venv/bin/activate
pip install -U "diffusers @ git+https://github.com/huggingface/diffusers.git" transformers accelerate safetensors huggingface_hub hf_xet sentencepiece protobuf triton
pip install -e .
Recommended 1024px low-VRAM run
python scripts/infer_svdquant_transformer.py \
--svdquant-transformer Tanmaypatil123/krea-turbo-svdquant \
--backend pytorch_sim \
--low-vram \
--cpu-offload model \
--block-offload \
--num-blocks-on-gpu 1 \
--out-chunk 1024 \
--vae-tiling \
--vae-slicing \
--height 1024 \
--width 1024 \
--steps 8 \
--prompt "a cinematic photo of a small friendly white robot doctor, soft studio lighting" \
--out outputs/krea_svdquant.png
Measured on RunPod RTX PRO 6000 Blackwell, 1024x1024, 8 steps:
[vram] load: allocated=1.24GiB reserved=1.27GiB peak=1.24GiB
[vram] encode: allocated≈10.6GiB reserved≈10.7GiB peak≈10.6GiB
[vram] offload: allocated≈8.9GiB reserved≈9.1GiB peak≈10.6GiB
[vram] generate: allocated=7.63GiB reserved=8.11GiB peak≈16.96GiB
12GB-class 768px run
python scripts/infer_svdquant_transformer.py \
--svdquant-transformer Tanmaypatil123/krea-turbo-svdquant \
--backend pytorch_sim \
--low-vram \
--cpu-offload model \
--block-offload \
--num-blocks-on-gpu 1 \
--out-chunk 1024 \
--vae-tiling \
--vae-slicing \
--height 768 \
--width 768 \
--steps 8 \
--prompt "a cinematic photo of a small friendly white robot doctor, soft studio lighting" \
--out outputs/krea_svdquant_768.png
Measured on RunPod RTX PRO 6000 Blackwell:
seconds≈9.2
[vram] generate: allocated=7.63GiB reserved=8.11GiB peak≈11.88GiB
Python API
import torch
from diffusers import Krea2Pipeline
from krea2_svdquant.runtime.load import load_svdquant_transformer
pipe = Krea2Pipeline.from_pretrained("krea/Krea-2-Turbo", torch_dtype=torch.bfloat16)
load_svdquant_transformer(
pipe.transformer,
"Tanmaypatil123/krea-turbo-svdquant",
backend="pytorch_sim",
)
pipe.to("cuda")
image = pipe(
"a cinematic photo of a small friendly white robot doctor",
num_inference_steps=8,
guidance_scale=0.0,
height=1024,
width=1024,
).images[0]
image.save("krea_svdquant.png")
For consumer GPUs, use the script path above because it enables prompt embedding cache, text encoder offload/removal, transformer block offload, VAE tiling/slicing, and chunked SVDQuant runtime.
Backend notes
pytorch_sim: recommended practical backend today. Uses packed qweights + chunked low-VRAM runtime.triton_blackwell/triton_generic: experimental fused W4A16 residual and low-rank add kernels. Correctness verified, but currently slower than PyTorch chunked runtime for full Krea2.
Limitations
- This is a transformer-only checkpoint; it does not include tokenizer, scheduler, VAE, or text encoder.
- 1024px currently measures ~17GB peak with the recommended low-VRAM path. 768px reaches ~12GB-class peak.
- For 1024px 12-14GB, the next optimization target is attention/activation memory or a calibrated Blackwell FP4 /
tl.dot_scaledcheckpoint path. - Quality is intended to stay close to Krea2 Turbo, but this is an experimental SVDQuant checkpoint and may differ from BF16 output.
Citation / credits
Base model: krea/Krea-2-Turbo.
SVDQuant runtime/checkpoint tooling: https://github.com/Tanmaypatil123/krea2-svdquant
LoRA compatibility
This checkpoint can now run transformer LoRAs through the GitHub runtime's SVDQuant LoRA loader. The LoRA is attached as an inference-only side branch directly to each replaced SVDQuantLinear, so users can keep the transformer SVDQuant checkpoint active while applying Krea2 LoRAs.
Example tested LoRA: krea/Krea-2-LoRA-retroanime.
python scripts/infer_svdquant_transformer.py \n --svdquant-transformer Patil/krea-turbo-svdquant \n --lora krea/Krea-2-LoRA-retroanime \n --lora-weight-name retroanime.safetensors \n --lora-scale 0.85 \n --backend pytorch_sim \n --low-vram \n --cpu-offload model \n --block-offload \n --num-blocks-on-gpu 1 \n --out-chunk 1024 \n --vae-tiling \n --vae-slicing \n --height 768 \n --width 768 \n --steps 8
RTX 4090 verification with the retroanime LoRA:
loaded_svdquant_layers=224
loaded_lora=retroanime.safetensors matched_layers=224 scale=0.85
seconds=14.804
[vram] generate: allocated=7.63GiB reserved=8.11GiB peak=11.90GiB
- Downloads last month
- 36





