File size: 2,591 Bytes
b897fe7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
---
license: mit
library_name: diffusers
pipeline_tag: text-to-image
tags:
- diffusers
- cafm
- continuous-adversarial-flow-models
- class-conditional
- imagenet
- text-to-image
- z-image
inference: true
widget:
- output:
    url: CAFM-JiT-H-16-256/demo.png
language:
- en
---

# BiliSakura/CAFM-diffusers

Self-contained [Continuous Adversarial Flow Models](https://arxiv.org/abs/2604.11521) checkpoints for Hugging Face diffusers.

Converted from `ByteDance-Seed/Adversarial-Flow-Models` using `libs/AFM-diffusers/scripts/convert_cafm_to_diffusers.py`.
Z-Image weights are bundled self-contained under `CAFM-Z-Image-T2I/`.

## Demo

`CAFM-JiT-H-16-256` — class **207** (*golden retriever*), seed **0**, 100 NFE (Heun):

<p align="center">
  <img src="CAFM-JiT-H-16-256/demo.png" alt="CAFM-JiT-H-16-256 demo (class 207, seed 0)" width="256"/>
</p>

Each variant folder includes `demo.png` generated with the same prompt settings.

## Benchmark results (ImageNet 256×256)

| Model | Space | NFE | FID | Checkpoint |
| --- | --- | --- | --- | --- |
| CAFM JiT-H/16 | pixel | 100 | 1.80 | `CAFM-JiT-H-16-256/` |
| CAFM SiT-XL/2 | latent | 250 | 1.53 | `CAFM-SiT-XL-2-256/` |
| CAFM Z-Image | latent T2I | 25 | — | `CAFM-Z-Image-T2I/` |

## Available checkpoints

| Variant | Backbone | Steps | Solver |
| --- | --- | ---: | --- |
| `CAFM-JiT-H-16-256/` | JIT | 100 | heun |
| `CAFM-SiT-XL-2-256/` | SIT | 250 | heun |
| `CAFM-Z-Image-T2I/` | Z-IMAGE | 25 | euler |

## Inference

### ImageNet class-conditional (JiT / SiT)

```python
from pathlib import Path
import torch
from diffusers import DiffusionPipeline

model_dir = Path("./CAFM-SiT-XL-2-256")
pipe = DiffusionPipeline.from_pretrained(
    str(model_dir),
    local_files_only=True,
    custom_pipeline=str(model_dir / "pipeline.py"),
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
).to("cuda")

image = pipe(class_labels="golden retriever", num_inference_steps=250, sampler="heun").images[0]
```

### Text-to-image (Z-Image)

```python
from pathlib import Path
import torch
from diffusers import DiffusionPipeline

model_dir = Path("./CAFM-Z-Image-T2I")
pipe = DiffusionPipeline.from_pretrained(
    str(model_dir),
    local_files_only=True,
    custom_pipeline=str(model_dir / "pipeline.py"),
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)
pipe.enable_model_cpu_offload()  # recommended for single-GPU inference

image = pipe(
    prompt="A golden retriever sitting in a sunny park, photo realistic.",
    height=512,
    width=512,
    num_inference_steps=25,
    sampler="euler",
).images[0]
```