Dropout Decay Streaming Experiments
Archived preprint and reproducibility snapshot: https://doi.org/10.5281/zenodo.20616633
This project tests dropout decay only after first finding a model/data regime where static dropout has a real nonzero validation optimum.
The implementation is derived from Andrej Karpathy's nanochat repository:
https://github.com/karpathy/nanochat. Only the core tokenizer ideas and
foundational causal Transformer architecture are retained. Chat interfaces,
deployment scripts, distributed training code, and inference services are not
included. The original nanochat MIT copyright and permission notice are retained
in derived source files and in LICENSE.
Compliance
All Torch experiment runs are MPS-only. The runner exits before model creation if
MPS is unavailable, if PyTorch was not built with MPS, or if
PYTORCH_ENABLE_MPS_FALLBACK=1 is set.
Local Data and Environment
The project should not depend on another checkout of nanochat at runtime. Use
the project-local package and either:
--use-cached-data --cache-dir .cache/dropout_decayto reuse the local tokenizer and encoded token array; or--corpus/--corpus-globto build a fresh local cache from a source corpus.
The curated repo-local data artifacts are:
data/openwebtext10k/base_data_climbmix/shard_*.parquetdata/openwebtext10k/openwebtext10k.txtdata/tinystories/train-00000-of-00004.parquetdata/wikitext103_raw/train-00001-of-00002.parquet
The existing repo-local caches are:
.cache/dropout_decay/tokenizer-v4096.json.cache/dropout_decay/tokens-v4096-uint16.npy.cache/dropout_decay_tinystories/tokenizer-v4096.json.cache/dropout_decay_tinystories/tokens-v4096-uint16.npy.cache/dropout_decay_wikitext103/tokenizer-v4096.json.cache/dropout_decay_wikitext103/tokens-v4096-uint16.npy.cache/dropout_decay_openwebtext10k_8m/tokenizer-v4096.json.cache/dropout_decay_openwebtext10k_8m/tokens-v4096-uint16.npy.cache/dropout_decay_tinystories_8m/tokenizer-v4096.json.cache/dropout_decay_tinystories_8m/tokens-v4096-uint16.npy.cache/dropout_decay_wikitext103_8m/tokenizer-v4096.json.cache/dropout_decay_wikitext103_8m/tokens-v4096-uint16.npy
Use a project-local Python environment with MPS-capable PyTorch, for example
.venv/bin/python. Attribution to nanochat remains in the source and docs, but
experiment commands should not point into a separate nanochat repository.
Fast non-training checks can be run without launching an MPS experiment:
PYTHONPATH=src .venv/bin/python -m unittest discover -s tests -v
PYTHONPATH=src .venv/bin/python scripts/generate_paper_assets.py
Current Paper Evidence Path
The paper is backed by saved artifacts rather than by rerunning long MPS jobs during manuscript generation. The current evidence path is:
- Static calibration cells in
runs/coefficient_calibration/fit regime-specific interaction coefficients. - Locked-stream validation runs in
runs/compare frozen coefficient-derived schedules against broad fixed-dropout grids. scripts/generate_paper_assets.pyreads those saved metrics and coefficients to regenerate every paper table, figure, andpaper/paper_results_summary.json.paper/dropout_decay_regime_specific_pressure_law.texbuilds the final paper.
The headline paper artifacts are:
| Regime | Coefficients | Main 4M validation | L20 8M validation |
|---|---|---|---|
| OpenWebText10K | runs/coefficient_calibration/cross_regime_backtest/openwebtext10k_main_interaction/coefficients.json |
runs/openwebtext10k_l16_updated_formula_clean_5seed/locked_stream/20260530-174525/ |
runs/openwebtext10k_l20_8m_extrapolation_3seed/locked_stream/20260601-191150/ |
| TinyStories | runs/coefficient_calibration/tinystories_combined_plus_all_holdouts_interaction/coefficients.json |
runs/streaming_tinystories_interaction_schedule_l12/locked_stream/20260530-053831/, runs/streaming_tinystories_multiseed_validation_l12/locked_stream/20260530-111523/, and runs/streaming_tinystories_multiseed_validation_l12/locked_stream/20260530-141335/ |
runs/tinystories_l20_8m_extrapolation_3seed/locked_stream/20260602-112249/ |
| WikiText-103 | runs/coefficient_calibration/wikitext103_interaction/coefficients.json |
runs/wikitext103_l12_streaming_validation_5seed/locked_stream/20260531-093525/ |
runs/wikitext103_l20_8m_extrapolation_3seed/locked_stream/20260603-231952/ |
Every run writes:
config.json: command, model specs, data paths, environment, attribution.metrics.jsonl: one row per seed/model/dropout/stage.trace.jsonl: optional training and intermediate evaluation trace.summary.csv/summary.json: mean/std train loss, validation loss, and gap.model_selection.csv/model_selection.json: static-sweep optimum and plateau diagnostics for screen and confirm runs.
Old exploratory outputs are archived under archive/; they are not part of the
primary evidence path for the paper.
For exact headline reproduction, see REPRODUCING.md. Current documentation is:
docs/plan.md: regime-validation and coefficient-fitting plan.docs/formula_coefficient_methodology.md: coefficient target extraction, weighting, and fitting details.docs/openwebtext10k_streaming_report.md: OpenWebText10K regime report.docs/tinystories_streaming_report.md: TinyStories regime report.docs/wikitext103_streaming_report.md: WikiText-103 regime report.paper/dropout_decay_regime_specific_pressure_law.tex: paper source.paper/dropout_decay_regime_specific_pressure_law.pdf: generated paper PDF.