You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Dose-Response C2 (8M-5%) SFT: Unsafe oversampled to 5%

SFT fine-tuned from the corresponding base on Alchemist (3,350 images) for 20K iterations.

Alchemist (3,350 images)


Iterations	20,000
Samples seen	~5.12M
Global batch size	256
Microbatch (per GPU)	32
Hardware	8× NVIDIA H200
Precision	bfloat16 (amp_bf16)
Optimizer (transformer blocks)	Muon (lr=1e-4, momentum=0.95, nesterov, ns_steps=5, weight_decay=0)
Optimizer (other params)	AdamW (lr=1e-4, β=(0.9, 0.95), eps=1e-8, weight_decay=0)
LR schedule	1,000-step linear warmup, constant after
EMA	decay 0.999, started at step 0
Random seed	42
Trainer	Composer + FSDP

Trained with the PRX framework (Composer + FSDP). The full config.yaml is included for reproducibility.

Base model

Finetuned

(1)

this model