roneneldan/TinyStories
Viewer • Updated • 2.14M • 87.4k • 1.03k
A ~14.5M-parameter (≈6.3M non-embedding) decoder-only transformer trained from scratch in JAX / Flax NNX on TinyStories, reproducing the setup of Eldan & Li (2023, arXiv:2305.07759).
| metric | value |
|---|---|
| val loss (nats/token) | 1.680 |
| perplexity | 5.364 |
| bits/token | 2.423 |
Modern Llama/Mistral primitives, scaled down:
Weights are in model.safetensors. Reconstruct with the model code from the GitHub
repo and load_safetensors() (see sample.py). Tokenizer:
tokenizer.json.
Trained only on synthetic children's stories — coherent short English narratives, weak long-range consistency, no factual/world knowledge. Not for general use.