randyGPT โ€” model-ds-moe

A GPT-style language model trained from scratch in Rust on Project Gutenberg.

Model Details

Architecture MoE Transformer (causal LM)
Parameters 4.48M
Layers 12
Heads 4
Embedding dim 128
Experts 4 (top-2, dim 256)
Context window 256 tokens
Vocab size 2000 (BPE)
Training iters 12850
Best val loss 3.5012

Training

Trained on ~98MB of cleaned Project Gutenberg text (112 public domain books, v3 cleaning with Unicode normalization) with BPE-2000 tokenization, AdamW optimizer, cosine LR decay, ReduceLROnPlateau, dropout=0.1, and Metal GPU via Candle on Apple Silicon.

Usage

from modeling_randygpt import RandyGPTConfig, RandyGPTForCausalLM
from tokenizer_randygpt import RandyGPTTokenizer
from safetensors.torch import load_file
import torch

# Load
cfg   = RandyGPTConfig.from_pretrained("MonumentalSystems/randygpt-ds-moe")
model = RandyGPTForCausalLM(cfg)
state = load_file("model.safetensors")
model.load_state_dict(state, strict=True)
model.eval()

tok = RandyGPTTokenizer.from_file("tokenizer.json")

# Generate
prompt  = "Once upon a time"
ids     = torch.tensor([tok.encode(prompt)], dtype=torch.long)
out_ids = model.generate_text(ids, max_new_tokens=200, temperature=0.8)
print(tok.decode(out_ids[0].tolist()))

Source

Trained with randyGPT โ€” a GPT implementation in Rust with Metal GPU acceleration.

Downloads last month
46
Safetensors
Model size
4.48M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support