Alcyone β€” Base Model (pre-chat)

A 30.5M parameter GPT-2 style language model, pretrained from scratch on TinyStories.

This is the base model: fluent at continuing text, but not yet conversational. It is the foundation that the upcoming chat model will be fine-tuned on.

Named after the brightest star in the Pleiades cluster β€” part of a star-named model family.

Versioning philosophy

This project reserves the v1 milestone for the first chat-capable model. Everything before that β€” including this checkpoint β€” is considered the v0 / base-model stage.

  • v0 stage (now): pretraining from scratch. Goal = a model that writes coherent English text. βœ… achieved here.
  • v1 (upcoming): instruction fine-tuning (SFT) on top of this base, turning a text-completer into an assistant you can actually chat with.

In other words: this model is the raw fluency; v1 will add the ability to follow instructions.

From first experiment to this base model

First experiment This base model
Parameters ~4.2M 30.5M
Layers 4 8
Embedding dim 256 512
Attention heads 4 8
Context length 128 256
Vocab size 4,000 10,000
Training data 5k stories 500k stories
Training steps 300 8,000
Final eval loss β€” 1.81

The first experiment proved the from-scratch training loop works end-to-end. This base model scales it up into something that writes genuinely coherent stories β€” consistent characters, dialogue, and cause-and-effect.

Model details

Architecture GPT-2 (decoder-only Transformer)
Parameters 30.5M
Layers / Heads / Embedding 8 / 8 / 512
Context length 256 tokens
Vocab size 10,000
Tokenizer Byte-level BPE, trained from scratch on TinyStories
Initialization Random (no pretrained weights)
Objective Causal language modeling (next-token prediction)

Training

Data roneneldan/TinyStories β€” 500k story subset
Hardware Google Colab Free Tier (NVIDIA Tesla T4, 16GB)
Precision fp16
Optimizer AdamW (HF Trainer default)
Learning rate 5e-4, cosine schedule, 200 warmup steps
Batch size 32
Steps 8,000 (~0.6 epoch)
Wall-clock time ~41 minutes
Train loss 1.82
Eval loss 1.81

Train and eval loss stayed nearly identical throughout (1.82 vs 1.81), indicating clean generalization with no overfitting. The eval-loss curve plateaued near the end β€” suggesting further gains would come from a larger model or more data, not longer training.

Intended use & limitations

A base language model, not an instruction-tuned chat model. Given a short English prompt, it continues the text in TinyStories style.

It will NOT (yet):

  • follow instructions or answer questions β€” that is what the v1 chat model will add
  • produce text outside the children's-story domain
  • maintain perfect long-range plot coherence (local grammar is strong; global plot occasionally drifts)
  • stop cleanly at the end of a story (it tends to begin a new one, since it was trained on a continuous stream)

Quick start

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

REPO = "laskar-ks/alcyone-v1"   # ganti ke "laskar-ks/alcyone-base" jika repo di-rename

model     = AutoModelForCausalLM.from_pretrained(REPO)
tokenizer = AutoTokenizer.from_pretrained(REPO)

gen = pipeline("text-generation", model=model, tokenizer=tokenizer)
print(gen("Once upon a time, there was a little",
          max_new_tokens=120, do_sample=True, temperature=0.8,
          clean_up_tokenization_spaces=False)[0]["generated_text"])

Example output

Once upon a time, there was a little girl named Lily. She loved to play outside in the sunshine. One day, she saw a butterfly flying around in the sky. She wanted to catch it, but she accidentally stepped on a rock and fell down. Her mommy came to help her... Lily felt better and thanked her mommy.

Roadmap

  • v0 stage β€” first experiment (4.2M): proof the from-scratch pretraining loop works.
  • v0 stage β€” base model (30.5M, this): coherent story generation. The foundation.
  • v1 β€” chat (upcoming): instruction fine-tuning (SFT) to turn this base into a conversational assistant. This is the milestone that earns the "v1" name.
  • Beyond: larger model / more data to push past the current loss plateau.

About the name

Alcyone (Ξ· Tauri) is the brightest star in the Pleiades open star cluster. Part of a star-named model family alongside other projects (Parallax, Altair, etc.).

Author

Trained by Laskar as part of an AI engineering portfolio exploring agentic systems and foundational ML.

Downloads last month
36
Safetensors
Model size
4.22M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train laskar-ks/alcyone-v0