GPT-2 QAT (s/σ=2.0, 4000 total steps)

Quantization-Aware Training experiment: weights rounded to a grid s = 2.0×σ per layer, trained with straight-through estimator (STE) on WikiText-2.

Training history

Phase Steps LR Perplexity
§16b initial run 2000 1e-5 218.73
§16c continuation 2000 3e-6 216.28

Results

Model Perplexity
fp32 baseline 40.10
QAT 4000 total steps 216.28
Average weight sparsity 71.6%

Quantization details

  • Grid step s = ratio × layer_std, ratio=2.0
  • Sparsity ≈ 72% (weights rounded to zero)
  • Arithmetic simplification potential via CSE on sparse grids
  • See erncyp/experiments for full methodology
Downloads last month
3
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support