Readout recipe-control models (31M, 4 matched arms)

Four matched 31M Pythia-style (GPTNeoX architecture) pretraining runs used as the optimizer-recipe sensitivity control in Learning to Read Out: Unembedding Dynamics in Language Model Pretraining (appendix, fig:app-recipe-control-geometry). All arms share one tokenized Pile slice (pile_10B_seed1234.bin, GPT-NeoX-20B tokenizer, data_seed=1234), the same parameter seed (0), data order, fp16 precision, global batch 1024 (2,097,152 tokens/step), weight decay 0.1, and a 10B-token budget; each arm perturbs exactly one recipe axis:

Arm Perturbation
baseline/ none (peak_lr 1e-3, warmup 1430 steps, W_U lr multiplier 1.0)
long_warmup/ extended LR warmup
wu_lr_0p25/ output-readout (W_U) learning-rate multiplier 0.25×
wu_lr_4x/ output-readout (W_U) learning-rate multiplier 4×

Each arm ships config.json (full training config), metrics.csv (train/val curves), and ckpts/step<N>/model_fp16.pt checkpoint trajectories at token-milestone steps.

The trainer, tokenizer pipeline, and slice-building scripts are in the code release under experiments/ablations/pretraining_recipe_control/ (https://github.com/hematteo/learning-to-read-out): rebuild the exact slice with trainer/tokenize_slice.py or scripts/fetch_pythia_preshuffled.py (sources and licences in docs/DATA.md). These are research artifacts for analyzing W_U readout geometry across training, not general-purpose language models.

Citation

@misc{he2026learningtoreadout,
  title  = {Learning to Read Out: Unembedding Dynamics in Language Model Pretraining},
  author = {He, Matteo and Shen, William F. and Iacob, Alex and Jovanovic, Andrej
            and Qiu, Xinchi and Lane, Nicholas D.},
  year   = {2026},
  note   = {Under review. Code: https://github.com/hematteo/learning-to-read-out},
}

MIT. Trained on a slice of the Pile (monology/pile-uncopyrighted / EleutherAI/pile-standard-pythia-preshuffled); see the Pile's data statement for upstream text provenance.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support