GPT-2 QAT (s/σ=2.0, 4000 total steps)
Quantization-Aware Training experiment: weights rounded to a grid
s = 2.0×σ per layer, trained with straight-through estimator (STE)
on WikiText-2.
Training history
| Phase | Steps | LR | Perplexity |
|---|---|---|---|
| §16b initial run | 2000 | 1e-5 | 218.73 |
| §16c continuation | 2000 | 3e-6 | 216.28 |
Results
| Model | Perplexity |
|---|---|
| fp32 baseline | 40.10 |
| QAT 4000 total steps | 216.28 |
| Average weight sparsity | 71.6% |
Quantization details
- Grid step
s = ratio × layer_std, ratio=2.0 - Sparsity ≈ 72% (weights rounded to zero)
- Arithmetic simplification potential via CSE on sparse grids
- See erncyp/experiments for full methodology
- Downloads last month
- 3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support