nanoGPT-TR-v4 — Türkçe 50M Dil Modeli

Sıfırdan eğitilmiş 50M parametreli Türkçe dil modeli.

Mimari

  • Parametre: 50.88M (non-embedding)
  • Modern teknikler: RoPE + RMSNorm + SwiGLU + QK-Norm + Logit soft-cap
  • Optimizer: Muon (2D) + AdamW (1D + embedding)
  • Context: 512 (RoPE ile 1024'e extrapolate)
  • Vocab: 16K byte-level BPE Türkçe-optimize (~4.22 char/token)

Eğitim

  • Corpus: ~1.86B Türkçe token (filtreli)
  • Kaynaklar: Wikipedia TR, YÖK Tezleri özetleri, FinePDFs-Edu, FineWeb2-HQ, FineTranslations, akademik dergiler, OSCAR23
  • Eğitim: 30K step, etkin batch 256, ~2.5 epoch
  • Donanım: RTX 6000 Pro Blackwell
  • Final val loss: ~3.20 (perplexity ~24)

Varyantlar

  • v4_base.pt: Pretraining sonrası (continuation/completion)
  • v4_instruct.pt: SFT sonrası (ChatML format)

Kullanım

import torch
from model_v4 import GPTV4, GPTConfigV4
from tokenizers import Tokenizer

ckpt = torch.load("v4_base.pt", weights_only=False)
cfg = GPTConfigV4(**ckpt["config"])
model = GPTV4(cfg).cuda()
model.load_state_dict(ckpt["model"])
model.eval()

tok = Tokenizer.from_file("tokenizer-tr-16k.json")
ids = tok.encode("Bu çalışmada, ").ids
x = torch.tensor([ids]).cuda()
out = model.generate(x, max_new_tokens=200, temperature=0.8, top_k=50,
                    repetition_penalty=1.15)
print(tok.decode(out[0].tolist()))
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support