StoryGPT

A 50M parameter LLaMA-style decoder-only transformer pre-trained from scratch on the TinyStories dataset.

Built as an end-to-end CV showcase demonstrating a production-grade LLM pre-training pipeline.

Model Description

Component	Implementation
Attention	Grouped Query Attention (GQA) — same as LLaMA 2/3
Position Encoding	Rotary Embeddings (RoPE)
Normalization	RMSNorm
Activation	SwiGLU FFN
Weight Tying	Embedding weight = Output head weight
Tokenizer	Custom BPE trained from scratch (16,384 vocab)

Config:

vocab_size    : 16,384
context_length: 512
emb_dim       : 512
n_heads       : 8
n_kv_heads    : 4   (GQA)
n_layers      : 8
ffn_hidden    : 1,376
Parameters    : ~50M

Training

Dataset: TinyStories (150k stories, ~40M tokens)
Steps: 20,000
Optimizer: AdamW (β=(0.9, 0.95), weight_decay=0.1)
LR Schedule: Cosine decay with linear warmup (500 steps), peak 3e-4 → min 3e-5
Gradient Clipping: 1.0
Mixed Precision: torch.cuda.amp (AMP float16)
Hardware: 2× NVIDIA T4 (DataParallel) on Kaggle

Results

Metric	Value
Train Loss	1.36
Val Loss	1.41
Perplexity	4.09

Usage

import torch
from huggingface_hub import hf_hub_download
from tokenizers import Tokenizer

# Download model and tokenizer
weights_path = hf_hub_download(repo_id="YOUR_HF_USERNAME/StoryGPT", filename="best_model.pt")
tok_path     = hf_hub_download(repo_id="YOUR_HF_USERNAME/StoryGPT", filename="storygpt_tokenizer.json")

tokenizer = Tokenizer.from_file(tok_path)

# Load model (copy model source files locally first)
from StoryGPT.model.gpt import GPT
from StoryGPT.config import MODEL_CONFIG

model = GPT(MODEL_CONFIG)
weights = torch.load(weights_path, map_location="cpu")
if list(weights.keys())[0].startswith("module."):
    weights = {k.replace("module.", ""): v for k, v in weights.items()}
model.load_state_dict(weights)
model.eval()

Sample Output

Once upon a time, there was a little boy named Timmy. Timmy loved to play with his toys and go on adventures. One day, he decided to explore the forest near his house...

License

MIT

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

ziadkassem
/

StoryGPT