roneneldan/TinyStories
Viewer • Updated • 2.14M • 79.1k • 1.04k
A 50M parameter LLaMA-style decoder-only transformer pre-trained from scratch on the TinyStories dataset.
Built as an end-to-end CV showcase demonstrating a production-grade LLM pre-training pipeline.
| Component | Implementation |
|---|---|
| Attention | Grouped Query Attention (GQA) — same as LLaMA 2/3 |
| Position Encoding | Rotary Embeddings (RoPE) |
| Normalization | RMSNorm |
| Activation | SwiGLU FFN |
| Weight Tying | Embedding weight = Output head weight |
| Tokenizer | Custom BPE trained from scratch (16,384 vocab) |
Config:
vocab_size : 16,384
context_length: 512
emb_dim : 512
n_heads : 8
n_kv_heads : 4 (GQA)
n_layers : 8
ffn_hidden : 1,376
Parameters : ~50M
| Metric | Value |
|---|---|
| Train Loss | 1.36 |
| Val Loss | 1.41 |
| Perplexity | 4.09 |
import torch
from huggingface_hub import hf_hub_download
from tokenizers import Tokenizer
# Download model and tokenizer
weights_path = hf_hub_download(repo_id="YOUR_HF_USERNAME/StoryGPT", filename="best_model.pt")
tok_path = hf_hub_download(repo_id="YOUR_HF_USERNAME/StoryGPT", filename="storygpt_tokenizer.json")
tokenizer = Tokenizer.from_file(tok_path)
# Load model (copy model source files locally first)
from StoryGPT.model.gpt import GPT
from StoryGPT.config import MODEL_CONFIG
model = GPT(MODEL_CONFIG)
weights = torch.load(weights_path, map_location="cpu")
if list(weights.keys())[0].startswith("module."):
weights = {k.replace("module.", ""): v for k, v in weights.items()}
model.load_state_dict(weights)
model.eval()
Once upon a time, there was a little boy named Timmy. Timmy loved to play with his toys and go on adventures. One day, he decided to explore the forest near his house...
MIT