Alcyone — Base Model (pre-chat)

A 30.5M parameter GPT-2 style language model, pretrained from scratch on TinyStories.

This is the base model: fluent at continuing text, but not yet conversational. It is the foundation that the upcoming chat model will be fine-tuned on.

Named after the brightest star in the Pleiades cluster — part of a star-named model family.

Versioning philosophy

This project reserves the v1 milestone for the first chat-capable model. Everything before that — including this checkpoint — is considered the v0 / base-model stage.

v0 stage (now): pretraining from scratch. Goal = a model that writes coherent English text. ✅ achieved here.
v1 (upcoming): instruction fine-tuning (SFT) on top of this base, turning a text-completer into an assistant you can actually chat with.

In other words: this model is the raw fluency; v1 will add the ability to follow instructions.

From first experiment to this base model

	First experiment	This base model
Parameters	~4.2M	30.5M
Layers	4	8
Embedding dim	256	512
Attention heads	4	8
Context length	128	256
Vocab size	4,000	10,000
Training data	5k stories	500k stories
Training steps	300	8,000
Final eval loss	—	1.81

The first experiment proved the from-scratch training loop works end-to-end. This base model scales it up into something that writes genuinely coherent stories — consistent characters, dialogue, and cause-and-effect.

Model details


Architecture	GPT-2 (decoder-only Transformer)
Parameters	30.5M
Layers / Heads / Embedding	8 / 8 / 512
Context length	256 tokens
Vocab size	10,000
Tokenizer	Byte-level BPE, trained from scratch on TinyStories
Initialization	Random (no pretrained weights)
Objective	Causal language modeling (next-token prediction)

Training


Data	roneneldan/TinyStories — 500k story subset
Hardware	Google Colab Free Tier (NVIDIA Tesla T4, 16GB)
Precision	fp16
Optimizer	AdamW (HF Trainer default)
Learning rate	5e-4, cosine schedule, 200 warmup steps
Batch size	32
Steps	8,000 (~0.6 epoch)
Wall-clock time	~41 minutes
Train loss	1.82
Eval loss	1.81

Train and eval loss stayed nearly identical throughout (1.82 vs 1.81), indicating clean generalization with no overfitting. The eval-loss curve plateaued near the end — suggesting further gains would come from a larger model or more data, not longer training.

Intended use & limitations

A base language model, not an instruction-tuned chat model. Given a short English prompt, it continues the text in TinyStories style.

It will NOT (yet):

follow instructions or answer questions — that is what the v1 chat model will add
produce text outside the children's-story domain
maintain perfect long-range plot coherence (local grammar is strong; global plot occasionally drifts)
stop cleanly at the end of a story (it tends to begin a new one, since it was trained on a continuous stream)

Quick start

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

REPO = "laskar-ks/alcyone-v1"   # ganti ke "laskar-ks/alcyone-base" jika repo di-rename

model     = AutoModelForCausalLM.from_pretrained(REPO)
tokenizer = AutoTokenizer.from_pretrained(REPO)

gen = pipeline("text-generation", model=model, tokenizer=tokenizer)
print(gen("Once upon a time, there was a little",
          max_new_tokens=120, do_sample=True, temperature=0.8,
          clean_up_tokenization_spaces=False)[0]["generated_text"])

Example output

Once upon a time, there was a little girl named Lily. She loved to play outside in the sunshine. One day, she saw a butterfly flying around in the sky. She wanted to catch it, but she accidentally stepped on a rock and fell down. Her mommy came to help her... Lily felt better and thanked her mommy.

Roadmap

v0 stage — first experiment (4.2M): proof the from-scratch pretraining loop works.
v0 stage — base model (30.5M, this): coherent story generation. The foundation.
v1 — chat (upcoming): instruction fine-tuning (SFT) to turn this base into a conversational assistant. This is the milestone that earns the "v1" name.
Beyond: larger model / more data to push past the current loss plateau.

About the name

Alcyone (η Tauri) is the brightest star in the Pleiades open star cluster. Part of a star-named model family alongside other projects (Parallax, Altair, etc.).

Author

Trained by Laskar as part of an AI engineering portfolio exploring agentic systems and foundational ML.

Downloads last month: 36

Safetensors

Model size

4.22M params

Tensor type

F32

laskar-ks
/

alcyone-v0