--- license: apache-2.0 language: - en library_name: diffusers tags: - diffusers - image-generation - class-conditional - nit pipeline_tag: unconditional-image-generation widget: - output: url: demo.png --- # NiT-diffusers Native diffusers implementation of **NiT** (Native-resolution Image Transformer). Each variant folder is self-contained: - `pipeline.py` — `NiTPipeline` - `scheduler/scheduler_config.json` — `FlowMatchEulerDiscreteScheduler` config (class ships with Diffusers) - `transformer/nit_transformer_2d.py` — `NiTTransformer2DModel` - `vae/` — `AutoencoderDC` weights + config No separate `NiT-diffusers` package at inference time; only PyPI `diffusers` plus local custom code in the variant directory. ## Available checkpoints | Checkpoint | Path | Resolution | Recommended settings | | --- | --- | --- | --- | | NiT-S | `./NiT-S` | 256×256 | 250 steps, CFG 2.25, interval (0.0, 0.7) | | NiT-B | `./NiT-B` | 256×256 | 250 steps, CFG 2.25, interval (0.0, 0.7) | | NiT-L | `./NiT-L` | 512×512 | 250 steps, CFG 2.05, interval (0.0, 0.7) | | NiT-XL | `./NiT-XL` | 512×512 | 250 steps, CFG 2.05, interval (0.0, 0.7) | ## ImageNet class labels Each variant keeps an English `id2label` map directly in its own `model_index.json` (DiT-style). - `pipe.id2label` — inspect id → English label correspondence - `pipe.labels` — reverse map (English synonym → id), sorted for browsing - `pipe.get_label_ids("golden retriever")` - `pipe(class_labels="golden retriever", ...)` — string labels resolved automatically ## Inference Run the bundled demo script from the repo root: ```bash python demo_inference.py ``` This writes `demo.png` using `NiT-XL` with the settings below. ```python from pathlib import Path import torch from diffusers import DiffusionPipeline model_dir = Path("./NiT-XL").resolve() pipe = DiffusionPipeline.from_pretrained( str(model_dir), local_files_only=True, custom_pipeline=str(model_dir / "pipeline.py"), trust_remote_code=True, torch_dtype=torch.bfloat16, ) pipe.to("cuda") print(pipe.id2label[207]) print(pipe.get_label_ids("golden retriever")) generator = torch.Generator(device="cuda").manual_seed(42) image = pipe( class_labels="golden retriever", height=512, width=512, num_inference_steps=250, guidance_scale=2.05, guidance_interval=(0.0, 0.7), generator=generator, ).images[0] image.save("demo.png") ``` Load a **variant subfolder** (e.g. `./NiT-XL`, `./NiT-L`, `./NiT-B`, or `./NiT-S`), not the repo root. For NiT-S / NiT-B at 256×256 (official defaults): ```python model_dir = Path("./NiT-S").resolve() # or ./NiT-B pipe = DiffusionPipeline.from_pretrained( str(model_dir), local_files_only=True, custom_pipeline=str(model_dir / "pipeline.py"), trust_remote_code=True, torch_dtype=torch.bfloat16, ) pipe.to("cuda") image = pipe( class_labels="golden retriever", height=256, width=256, num_inference_steps=250, guidance_scale=2.25, guidance_interval=(0.0, 0.7), generator=torch.Generator(device="cuda").manual_seed(42), ).images[0] ``` Hub usage follows Hugging Face model-id style (`UserID/RepoID`): ```python from diffusers import DiffusionPipeline import torch pipe = DiffusionPipeline.from_pretrained( "BiliSakura/NiT-diffusers", subfolder="NiT-XL", custom_pipeline="pipeline.py", trust_remote_code=True, torch_dtype=torch.bfloat16, ) pipe.to("cuda") ``` ## Citation ```bibtex @article{wang2025native, title={Native-Resolution Image Synthesis}, author={Wang, Zidong and Bai, Lei and Yue, Xiangyu and Ouyang, Wanli and Zhang, Yiyuan}, year={2025}, eprint={2506.03131}, archivePrefix={arXiv}, primaryClass={cs.CV} } ```