GPT-2 QAT (s/σ=2.0, 4000 total steps)

Quantization-Aware Training experiment: weights rounded to a grid s = 2.0×σ per layer, trained with straight-through estimator (STE) on WikiText-2.

Training history

Phase	Steps	LR	Perplexity
§16b initial run	2000	1e-5	218.73
§16c continuation	2000	3e-6	216.28

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support