Ideogram 4 FP8 -> SDNQ UInt4

This is an experimental SDNQ UInt4 conversion of ideogram-ai/ideogram-4-fp8. It is intended for local research and non-commercial use under the upstream Ideogram 4 license. The conversion was made from the FP8 checkpoint, materializing FP8 linears back to bf16 and then applying static SDNQ uint4 component-by-component.

The model includes SDNQ-compressed text_encoder, transformer, unconditional_transformer, and vae components. The official ideogram4 loader does not know how to instantiate SDNQ-packed custom transformers, so this repository includes ideogram4_sdnq_pipeline.py.

Usage

import torch
from ideogram4 import PRESETS
from ideogram4_sdnq_pipeline import Ideogram4SDNQPipeline

pipe = Ideogram4SDNQPipeline.from_pretrained(
    "WaveCut/ideogram-4-sdnq-uint4",
    device="cuda",
    dtype=torch.bfloat16,
)

preset = PRESETS["V4_DEFAULT_20"]
image = pipe(
    "a typographic poster reading HELLO WORLD",
    height=1024,
    width=1024,
    num_steps=preset.num_steps,
    guidance_schedule=preset.guidance_schedule,
    mu=preset.mu,
    std=preset.std,
    seed=4101,
    raise_on_caption_issues=False,
)[0]
image.save("out.png")

Install requirements:

pip install git+https://github.com/ideogram-oss/ideogram4 sdnq safetensors transformers accelerate pillow

Component Structure

Upstream FP8 structure:

  • text_encoder: Qwen3-VL text path used in text-only mode. Hidden states from 13 layers are concatenated for the DiT.
  • transformer: conditional 34-layer single-stream DiT.
  • unconditional_transformer: image-only negative branch used for asymmetric CFG.
  • vae: Flux2-style KL autoencoder decoder.
  • tokenizer and scheduler: copied from upstream.

Quantization

Component Source materialized MB SDNQ state MB Quantize s Quant peak nvidia MB
transformer 17698.84 4979.66 112.64 36525.00
unconditional_transformer 17698.84 4979.66 108.68 36525.00
text_encoder 14435.59 4097.53 102.32 24477.00
vae 160.31 50.19 2.68 861.00

Benchmark

Hardware: RunPod NVIDIA RTX PRO 6000 Blackwell Server Edition, single process, concurrency 1. Generation used 10 structured JSON prompts at 1024x1024 with V4_DEFAULT_20. The FP8 baseline was loaded through the upstream ideogram4 Ideogram4Pipeline.from_pretrained recipe with weights_repo="ideogram-ai/ideogram-4-fp8"; magic-prompt expansion was disabled because the prompts are already structured captions.

Variant Load s Load peak reserved MB Load peak nvidia MB Cold request s Hot mean s Gen peak reserved MB Gen peak nvidia MB
original 267.83 28198.00 28759.00 17.90 17.51 34430.00 35099.00
sdnq 239.46 14558.00 15109.00 18.56 16.52 21650.00 22321.00

Example Matrix

The matrix below keeps the original FP8 and SDNQ UInt4 outputs side by side in narrow vertical columns. It is a WebP at quality 95.

Original FP8 vs SDNQ UInt4 vertical comparison

Prompt Set

# id summary
1 editorial_watch_photo A photorealistic editorial product photograph of a transparent mechanical wristwatch resting on a wet black stone slab, with tiny engraved labels visible on the dial.
2 risograph_botanical_poster A layered risograph botanical exhibition poster with bold overprint textures and clean typographic hierarchy.
3 cyrillic_cafe_menu A cozy Moscow cafe menu board photographed straight-on, testing clean Cyrillic typography in chalk and printed labels.
4 brutalist_architecture A cinematic architectural photograph of a brutalist library atrium with tiny wayfinding signs and people for scale.
5 ink_manga_rain A dramatic black-and-white manga splash page of a courier cycling through rain, with sound effects and shop signage.
6 museum_clay_render A polished 3D clay render of a museum diorama showing a future Arctic research station with labeled miniature modules.
7 food_packaging_label A realistic premium chocolate bar packaging mockup with layered foil, embossed typography, and ingredient microcopy.
8 fantasy_map_typography A hand-painted fantasy map on parchment with readable place names, compass ornament, and coastal illustrations.
9 streetwear_lookbook A fashion lookbook cover photograph for a streetwear collection, with crisp cover typography and realistic fabric textures.
10 scientific_cutaway A detailed scientific cutaway illustration of a compact fusion battery prototype with annotated parts and clean technical typography.

Files

  • prompts.json: the 10 structured prompts used for the comparison.
  • assets/original_vs_sdnq_vertical.webp: vertical side-by-side WebP comparison matrix for original FP8 vs SDNQ UInt4, quality 95.
  • assets/sdnq_vs_nf4_4090_vertical.webp: vertical side-by-side WebP comparison matrix for the RTX 4090 SDNQ vs official NF4 follow-up, quality 95.
  • benchmark/: raw benchmark JSONL/CSV files and summary.json.
  • quantization_manifest.json: component-level quantization timings, storage, and VRAM peaks.
  • ideogram4_sdnq_pipeline.py: loader helper for the SDNQ custom transformer components.

RTX 4090 Follow-up: SDNQ UInt4 vs Official NF4

Hardware: RunPod NVIDIA GeForce RTX 4090, 24 GB VRAM, single process, concurrency 1. Both variants used the same 10 structured captions from prompts.json, 1024x1024, V4_DEFAULT_20, and no magic-prompt expansion. nf4 uses the official ideogram-ai/ideogram-4-nf4 checkpoint through the upstream ideogram4 loader.

Variant Cases Load s Load peak reserved MB Load peak nvidia MB Cold request s Hot mean s Hot max s Gen peak reserved MB Gen peak nvidia MB
sdnq 10.00 211.61 14124.00 14466.00 59.65 37.05 37.57 19768.00 20521.00
nf4 10.00 269.31 15370.00 15766.00 36.57 36.31 36.77 21012.00 21801.00

SDNQ vs official NF4 on RTX 4090

Raw follow-up metrics are in benchmark/summary_4090_sdnq_vs_nf4.json, benchmark/sdnq_4090_metrics.*, and benchmark/nf4_4090_metrics.*. The exact runner used for the follow-up is benchmark/followup_runner.py.

License

This checkpoint is derived from ideogram-ai/ideogram-4-fp8 and follows the upstream Ideogram 4 non-commercial license. See LICENSE.md.

Downloads last month
36
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for WaveCut/ideogram-4-sdnq-uint4

Finetuned
(1)
this model