Ideogram 4 FP8 -> SDNQ UInt4

This is an experimental SDNQ UInt4 conversion of ideogram-ai/ideogram-4-fp8. It is intended for local research and non-commercial use under the upstream Ideogram 4 license. The conversion was made from the FP8 checkpoint, materializing FP8 linears back to bf16 and then applying static SDNQ uint4 component-by-component.

The model includes SDNQ-compressed text_encoder, transformer, unconditional_transformer, and vae components. The official ideogram4 loader does not know how to instantiate SDNQ-packed custom transformers, so this repository includes ideogram4_sdnq_pipeline.py.

Usage

import torch
from ideogram4 import PRESETS
from ideogram4_sdnq_pipeline import Ideogram4SDNQPipeline

pipe = Ideogram4SDNQPipeline.from_pretrained(
    "WaveCut/ideogram-4-sdnq-uint4",
    device="cuda",
    dtype=torch.bfloat16,
)

preset = PRESETS["V4_DEFAULT_20"]
image = pipe(
    "a typographic poster reading HELLO WORLD",
    height=1024,
    width=1024,
    num_steps=preset.num_steps,
    guidance_schedule=preset.guidance_schedule,
    mu=preset.mu,
    std=preset.std,
    seed=4101,
    raise_on_caption_issues=False,
)[0]
image.save("out.png")

Install requirements:

pip install git+https://github.com/ideogram-oss/ideogram4 sdnq safetensors transformers accelerate pillow

Component Structure

Upstream FP8 structure:

text_encoder: Qwen3-VL text path used in text-only mode. Hidden states from 13 layers are concatenated for the DiT.
transformer: conditional 34-layer single-stream DiT.
unconditional_transformer: image-only negative branch used for asymmetric CFG.
vae: Flux2-style KL autoencoder decoder.
tokenizer and scheduler: copied from upstream.

Quantization

Component	Source materialized MB	SDNQ state MB	Quantize s	Quant peak nvidia MB
transformer	17698.84	4979.66	112.64	36525.00
unconditional_transformer	17698.84	4979.66	108.68	36525.00
text_encoder	14435.59	4097.53	102.32	24477.00
vae	160.31	50.19	2.68	861.00

Benchmark

Hardware: RunPod NVIDIA RTX PRO 6000 Blackwell Server Edition, single process, concurrency 1. Generation used 10 structured JSON prompts at 1024x1024 with V4_DEFAULT_20. The FP8 baseline was loaded through the upstream ideogram4 Ideogram4Pipeline.from_pretrained recipe with weights_repo="ideogram-ai/ideogram-4-fp8"; magic-prompt expansion was disabled because the prompts are already structured captions.

Variant	Load s	Load peak reserved MB	Load peak nvidia MB	Cold request s	Hot mean s	Gen peak reserved MB	Gen peak nvidia MB
original	267.83	28198.00	28759.00	17.90	17.51	34430.00	35099.00
sdnq	239.46	14558.00	15109.00	18.56	16.52	21650.00	22321.00

Example Matrix

The matrix below keeps the original FP8 and SDNQ UInt4 outputs side by side in narrow vertical columns. It is a WebP at quality 95.

Prompt Set

#	id	summary
1	`editorial_watch_photo`	A photorealistic editorial product photograph of a transparent mechanical wristwatch resting on a wet black stone slab, with tiny engraved labels visible on the dial.
2	`risograph_botanical_poster`	A layered risograph botanical exhibition poster with bold overprint textures and clean typographic hierarchy.
3	`cyrillic_cafe_menu`	A cozy Moscow cafe menu board photographed straight-on, testing clean Cyrillic typography in chalk and printed labels.
4	`brutalist_architecture`	A cinematic architectural photograph of a brutalist library atrium with tiny wayfinding signs and people for scale.
5	`ink_manga_rain`	A dramatic black-and-white manga splash page of a courier cycling through rain, with sound effects and shop signage.
6	`museum_clay_render`	A polished 3D clay render of a museum diorama showing a future Arctic research station with labeled miniature modules.
7	`food_packaging_label`	A realistic premium chocolate bar packaging mockup with layered foil, embossed typography, and ingredient microcopy.
8	`fantasy_map_typography`	A hand-painted fantasy map on parchment with readable place names, compass ornament, and coastal illustrations.
9	`streetwear_lookbook`	A fashion lookbook cover photograph for a streetwear collection, with crisp cover typography and realistic fabric textures.
10	`scientific_cutaway`	A detailed scientific cutaway illustration of a compact fusion battery prototype with annotated parts and clean technical typography.

Files

prompts.json: the 10 structured prompts used for the comparison.
assets/original_vs_sdnq_vertical.webp: vertical side-by-side WebP comparison matrix for original FP8 vs SDNQ UInt4, quality 95.
assets/sdnq_vs_nf4_4090_vertical.webp: vertical side-by-side WebP comparison matrix for the RTX 4090 SDNQ vs official NF4 follow-up, quality 95.
benchmark/: raw benchmark JSONL/CSV files and summary.json.
quantization_manifest.json: component-level quantization timings, storage, and VRAM peaks.
ideogram4_sdnq_pipeline.py: loader helper for the SDNQ custom transformer components.

RTX 4090 Follow-up: SDNQ UInt4 vs Official NF4

Hardware: RunPod NVIDIA GeForce RTX 4090, 24 GB VRAM, single process, concurrency 1. Both variants used the same 10 structured captions from prompts.json, 1024x1024, V4_DEFAULT_20, and no magic-prompt expansion. nf4 uses the official ideogram-ai/ideogram-4-nf4 checkpoint through the upstream ideogram4 loader.

Variant	Cases	Load s	Load peak reserved MB	Load peak nvidia MB	Cold request s	Hot mean s	Hot max s	Gen peak reserved MB	Gen peak nvidia MB
sdnq	10.00	211.61	14124.00	14466.00	59.65	37.05	37.57	19768.00	20521.00
nf4	10.00	269.31	15370.00	15766.00	36.57	36.31	36.77	21012.00	21801.00

Raw follow-up metrics are in benchmark/summary_4090_sdnq_vs_nf4.json, benchmark/sdnq_4090_metrics.*, and benchmark/nf4_4090_metrics.*. The exact runner used for the follow-up is benchmark/followup_runner.py.

License

This checkpoint is derived from ideogram-ai/ideogram-4-fp8 and follows the upstream Ideogram 4 non-commercial license. See LICENSE.md.

Downloads last month: 36

Model tree for WaveCut/ideogram-4-sdnq-uint4

Base model

ideogram-ai/ideogram-4-fp8

Finetuned

(1)

this model