Dropout Decay Streaming Experiments

Archived preprint and reproducibility snapshot: https://doi.org/10.5281/zenodo.20616633

This project tests dropout decay only after first finding a model/data regime where static dropout has a real nonzero validation optimum.

The implementation is derived from Andrej Karpathy's nanochat repository: https://github.com/karpathy/nanochat. Only the core tokenizer ideas and foundational causal Transformer architecture are retained. Chat interfaces, deployment scripts, distributed training code, and inference services are not included. The original nanochat MIT copyright and permission notice are retained in derived source files and in LICENSE.

Compliance

All Torch experiment runs are MPS-only. The runner exits before model creation if MPS is unavailable, if PyTorch was not built with MPS, or if PYTORCH_ENABLE_MPS_FALLBACK=1 is set.

Local Data and Environment

The project should not depend on another checkout of nanochat at runtime. Use the project-local package and either:

  • --use-cached-data --cache-dir .cache/dropout_decay to reuse the local tokenizer and encoded token array; or
  • --corpus / --corpus-glob to build a fresh local cache from a source corpus.

The curated repo-local data artifacts are:

  • data/openwebtext10k/base_data_climbmix/shard_*.parquet
  • data/openwebtext10k/openwebtext10k.txt
  • data/tinystories/train-00000-of-00004.parquet
  • data/wikitext103_raw/train-00001-of-00002.parquet

The existing repo-local caches are:

  • .cache/dropout_decay/tokenizer-v4096.json
  • .cache/dropout_decay/tokens-v4096-uint16.npy
  • .cache/dropout_decay_tinystories/tokenizer-v4096.json
  • .cache/dropout_decay_tinystories/tokens-v4096-uint16.npy
  • .cache/dropout_decay_wikitext103/tokenizer-v4096.json
  • .cache/dropout_decay_wikitext103/tokens-v4096-uint16.npy
  • .cache/dropout_decay_openwebtext10k_8m/tokenizer-v4096.json
  • .cache/dropout_decay_openwebtext10k_8m/tokens-v4096-uint16.npy
  • .cache/dropout_decay_tinystories_8m/tokenizer-v4096.json
  • .cache/dropout_decay_tinystories_8m/tokens-v4096-uint16.npy
  • .cache/dropout_decay_wikitext103_8m/tokenizer-v4096.json
  • .cache/dropout_decay_wikitext103_8m/tokens-v4096-uint16.npy

Use a project-local Python environment with MPS-capable PyTorch, for example .venv/bin/python. Attribution to nanochat remains in the source and docs, but experiment commands should not point into a separate nanochat repository.

Fast non-training checks can be run without launching an MPS experiment:

PYTHONPATH=src .venv/bin/python -m unittest discover -s tests -v
PYTHONPATH=src .venv/bin/python scripts/generate_paper_assets.py

Current Paper Evidence Path

The paper is backed by saved artifacts rather than by rerunning long MPS jobs during manuscript generation. The current evidence path is:

  1. Static calibration cells in runs/coefficient_calibration/ fit regime-specific interaction coefficients.
  2. Locked-stream validation runs in runs/ compare frozen coefficient-derived schedules against broad fixed-dropout grids.
  3. scripts/generate_paper_assets.py reads those saved metrics and coefficients to regenerate every paper table, figure, and paper/paper_results_summary.json.
  4. paper/dropout_decay_regime_specific_pressure_law.tex builds the final paper.

The headline paper artifacts are:

Regime Coefficients Main 4M validation L20 8M validation
OpenWebText10K runs/coefficient_calibration/cross_regime_backtest/openwebtext10k_main_interaction/coefficients.json runs/openwebtext10k_l16_updated_formula_clean_5seed/locked_stream/20260530-174525/ runs/openwebtext10k_l20_8m_extrapolation_3seed/locked_stream/20260601-191150/
TinyStories runs/coefficient_calibration/tinystories_combined_plus_all_holdouts_interaction/coefficients.json runs/streaming_tinystories_interaction_schedule_l12/locked_stream/20260530-053831/, runs/streaming_tinystories_multiseed_validation_l12/locked_stream/20260530-111523/, and runs/streaming_tinystories_multiseed_validation_l12/locked_stream/20260530-141335/ runs/tinystories_l20_8m_extrapolation_3seed/locked_stream/20260602-112249/
WikiText-103 runs/coefficient_calibration/wikitext103_interaction/coefficients.json runs/wikitext103_l12_streaming_validation_5seed/locked_stream/20260531-093525/ runs/wikitext103_l20_8m_extrapolation_3seed/locked_stream/20260603-231952/

Every run writes:

  • config.json: command, model specs, data paths, environment, attribution.
  • metrics.jsonl: one row per seed/model/dropout/stage.
  • trace.jsonl: optional training and intermediate evaluation trace.
  • summary.csv / summary.json: mean/std train loss, validation loss, and gap.
  • model_selection.csv / model_selection.json: static-sweep optimum and plateau diagnostics for screen and confirm runs.

Old exploratory outputs are archived under archive/; they are not part of the primary evidence path for the paper.

For exact headline reproduction, see REPRODUCING.md. Current documentation is:

  • docs/plan.md: regime-validation and coefficient-fitting plan.
  • docs/formula_coefficient_methodology.md: coefficient target extraction, weighting, and fitting details.
  • docs/openwebtext10k_streaming_report.md: OpenWebText10K regime report.
  • docs/tinystories_streaming_report.md: TinyStories regime report.
  • docs/wikitext103_streaming_report.md: WikiText-103 regime report.
  • paper/dropout_decay_regime_specific_pressure_law.tex: paper source.
  • paper/dropout_decay_regime_specific_pressure_law.pdf: generated paper PDF.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support