NiT-diffusers / README.md
BiliSakura's picture
Upload folder using huggingface_hub
3d7e8b9 verified
---
license: apache-2.0
language:
- en
library_name: diffusers
tags:
- diffusers
- image-generation
- class-conditional
- nit
pipeline_tag: unconditional-image-generation
widget:
- output:
url: demo.png
---
# NiT-diffusers
Native diffusers implementation of **NiT** (Native-resolution Image Transformer). Each variant folder is self-contained:
- `pipeline.py` β€” `NiTPipeline`
- `scheduler/scheduler_config.json` β€” `FlowMatchEulerDiscreteScheduler` config (class ships with Diffusers)
- `transformer/nit_transformer_2d.py` β€” `NiTTransformer2DModel`
- `vae/` β€” `AutoencoderDC` weights + config
No separate `NiT-diffusers` package at inference time; only PyPI `diffusers` plus local custom code in the variant directory.
## Available checkpoints
| Checkpoint | Path | Resolution | Recommended settings |
| --- | --- | --- | --- |
| NiT-S | `./NiT-S` | 256Γ—256 | 250 steps, CFG 2.25, interval (0.0, 0.7) |
| NiT-B | `./NiT-B` | 256Γ—256 | 250 steps, CFG 2.25, interval (0.0, 0.7) |
| NiT-L | `./NiT-L` | 512Γ—512 | 250 steps, CFG 2.05, interval (0.0, 0.7) |
| NiT-XL | `./NiT-XL` | 512Γ—512 | 250 steps, CFG 2.05, interval (0.0, 0.7) |
## ImageNet class labels
Each variant keeps an English `id2label` map directly in its own `model_index.json` (DiT-style).
- `pipe.id2label` β€” inspect id β†’ English label correspondence
- `pipe.labels` β€” reverse map (English synonym β†’ id), sorted for browsing
- `pipe.get_label_ids("golden retriever")`
- `pipe(class_labels="golden retriever", ...)` β€” string labels resolved automatically
## Inference
Run the bundled demo script from the repo root:
```bash
python demo_inference.py
```
This writes `demo.png` using `NiT-XL` with the settings below.
```python
from pathlib import Path
import torch
from diffusers import DiffusionPipeline
model_dir = Path("./NiT-XL").resolve()
pipe = DiffusionPipeline.from_pretrained(
str(model_dir),
local_files_only=True,
custom_pipeline=str(model_dir / "pipeline.py"),
trust_remote_code=True,
torch_dtype=torch.bfloat16,
)
pipe.to("cuda")
print(pipe.id2label[207])
print(pipe.get_label_ids("golden retriever"))
generator = torch.Generator(device="cuda").manual_seed(42)
image = pipe(
class_labels="golden retriever",
height=512,
width=512,
num_inference_steps=250,
guidance_scale=2.05,
guidance_interval=(0.0, 0.7),
generator=generator,
).images[0]
image.save("demo.png")
```
Load a **variant subfolder** (e.g. `./NiT-XL`, `./NiT-L`, `./NiT-B`, or `./NiT-S`), not the repo root.
For NiT-S / NiT-B at 256Γ—256 (official defaults):
```python
model_dir = Path("./NiT-S").resolve() # or ./NiT-B
pipe = DiffusionPipeline.from_pretrained(
str(model_dir),
local_files_only=True,
custom_pipeline=str(model_dir / "pipeline.py"),
trust_remote_code=True,
torch_dtype=torch.bfloat16,
)
pipe.to("cuda")
image = pipe(
class_labels="golden retriever",
height=256,
width=256,
num_inference_steps=250,
guidance_scale=2.25,
guidance_interval=(0.0, 0.7),
generator=torch.Generator(device="cuda").manual_seed(42),
).images[0]
```
Hub usage follows Hugging Face model-id style (`UserID/RepoID`):
```python
from diffusers import DiffusionPipeline
import torch
pipe = DiffusionPipeline.from_pretrained(
"BiliSakura/NiT-diffusers",
subfolder="NiT-XL",
custom_pipeline="pipeline.py",
trust_remote_code=True,
torch_dtype=torch.bfloat16,
)
pipe.to("cuda")
```
## Citation
```bibtex
@article{wang2025native,
title={Native-Resolution Image Synthesis},
author={Wang, Zidong and Bai, Lei and Yue, Xiangyu and Ouyang, Wanli and Zhang, Yiyuan},
year={2025},
eprint={2506.03131},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```