SiT-diffusers

Diffusers-ready checkpoints for Scalable Interpolant Transformers (SiT), converted for local/offline use.

This root folder is a model collection that contains:

SiT-S-2-256-diffusers
SiT-B-2-256-diffusers
SiT-L-2-256-diffusers
SiT-XL-2-256-diffusers
SiT-XL-2-512-diffusers

Each subfolder is a self-contained Diffusers model repo with:

pipeline.py
transformer/transformer_sit.py
scheduler/scheduling_flow_match_sit.py
transformer/diffusion_pytorch_model.safetensors
vae/diffusion_pytorch_model.safetensors

Model Paths

Use paths relative to this root README:

Model	Resolution	Local path
SiT-S/2	256x256	`./SiT-S-2-256-diffusers`
SiT-B/2	256x256	`./SiT-B-2-256-diffusers`
SiT-L/2	256x256	`./SiT-L-2-256-diffusers`
SiT-XL/2	256x256	`./SiT-XL-2-256-diffusers`
SiT-XL/2	512x512	`./SiT-XL-2-512-diffusers`

Inference Demo (Diffusers)

1) Load a local subfolder checkpoint

import torch
from diffusers import DiffusionPipeline

model_path = "./SiT-XL-2-512-diffusers"  # change to any path in the table above
device = "cuda" if torch.cuda.is_available() else "cpu"

pipe = DiffusionPipeline.from_pretrained(
    model_path,
    trust_remote_code=True,
).to(device)

generator = torch.Generator(device=device).manual_seed(0)

# ImageNet class example: 207 = golden retriever
result = pipe(
    class_labels=207,
    height=512,
    width=512,
    num_inference_steps=250,  # official SiT comparisons commonly use 250 steps
    guidance_scale=4.0,
    generator=generator,
)

image = result.images[0]
image.save("sit_xl_512_demo.png")

2) Quick variant switch (256 models)

model_path = "./SiT-S-2-256-diffusers"
# model_path = "./SiT-B-2-256-diffusers"
# model_path = "./SiT-L-2-256-diffusers"
# model_path = "./SiT-XL-2-256-diffusers"

pipe = DiffusionPipeline.from_pretrained(model_path, trust_remote_code=True).to(device)
image = pipe(
    class_labels=207,
    height=256,
    width=256,
    num_inference_steps=250,
    guidance_scale=4.0,
    generator=generator,
).images[0]
image.save("sit_256_demo.png")

FID Reference (from Official SiT Results)

The table below summarizes widely cited SiT numbers from the official project materials for class-conditional ImageNet generation.

Model / setting	Resolution	FID-50K (lower is better)
SiT-S (400K steps)	256x256	57.6
SiT-B (400K steps)	256x256	33.5
SiT-L (400K steps)	256x256	17.2
SiT-XL (400K steps)	256x256	8.6
SiT-XL (cfg=1.5, ODE)	256x256	2.15
SiT-XL (cfg=1.5, SDE, `w(t)=sigma_t`)	256x256	2.06
SiT-XL (sample showcase)	512x512	Not reported in the same benchmark table

Note: FID depends on training recipe, sampler choice (ODE/SDE), guidance scale, and evaluation protocol. Treat this table as a reference to official SiT reports, not as guaranteed reproducibility for every conversion/export.

Source and Paper

Official SiT code: willisma/SiT
Project page: scalable-interpolant.github.io
Paper (arXiv): SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers

Citation

If you use SiT in your work, please cite:

@inproceedings{ma2024sit,
  title={SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers},
  author={Ma, Nanye and Goldstein, Mark and Albergo, Michael S. and Boffi, Nicholas M. and Vanden-Eijnden, Eric and Xie, Saining},
  booktitle={European Conference on Computer Vision (ECCV)},
  year={2024},
  note={Accepted to ECCV 2024}
}

Downloads last month: -

Inference Providers NEW

Unconditional Image Generation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including BiliSakura/SiT-diffusers

Visual Generation Models

Collection

5 items • Updated about 5 hours ago

Paper for BiliSakura/SiT-diffusers

SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers

Paper • 2401.08740 • Published Jan 16, 2024 • 14