MiniBananaMind-v3-9M

MiniBananaMind-v3-9M is a small causal language model trained from scratch on FineWeb-Edu and FineMath.

The model has about 8.9M parameters and uses a custom 8k-token byte-level BPE tokenizer with digit-aware tokenization.

This is a base language model, not an instruction-tuned chat assistant.

Model Details

Field	Value
Parameters	8,884,992
Architecture	Custom Llama-style decoder
Layers	9
Hidden size	256
Intermediate size	768
Attention heads	8
KV heads	2
Vocabulary size	8,192
Context length	1,024
Embeddings	Tied input/output embeddings
Weight format	safetensors
Training precision	BF16
Checkpoint used	latest mixed checkpoint

Tokenizer

MiniBananaMind-v3-9M uses a new digit-aware 8k tokenizer.

Digits are kept as separate tokens so numbers do not collapse into large number tokens during tokenization.

Digit IDs:

Token	ID
`1`	9
`2`	10
`3`	11
`4`	12
`5`	13
`6`	14
`7`	15
`8`	16
`9`	17
`0`	18

Examples:

18  -> [9, 16]
227 -> [10, 10, 15]

Training Data

MiniBananaMind-v3-9M was trained on:

HuggingFaceFW/fineweb-edu
HuggingFaceTB/finemath

The training mix used both general educational web text and math-heavy text.

Dataset	Tokens
FineWeb-Edu sample-10BT retokenized with digit tokenizer	12,047,375,481
FineMath finemath-4plus retokenized with digit tokenizer	1,500,000,000

Training setup:

Field	Value
Sequence length	1,024
FineMath sampling ratio	30%
FineWeb sampling ratio	70%
Batch size	72
Gradient accumulation	16
Tokens per optimizer step	1,179,648
Training steps	11,471
Approx training tokens seen	13,531,742,208
Learning rate	5e-4
Minimum learning rate	5e-5
Warmup steps	500
Weight decay	0.1
Hardware	NVIDIA RTX 5070 Ti

Evaluation

Formal benchmark results for this checkpoint are not included yet.

Usage

This model uses custom architecture code, so load it with trust_remote_code=True.

Install dependencies:

pip install -U transformers safetensors torch

Run inference:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "BananaMind/MiniBananaMind-v3-9M"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16 if torch.cuda.is_available() and torch.cuda.is_bf16_supported() else torch.float16,
).cuda().eval()

prompt = "The color of the sky is "
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)

with torch.no_grad():
    output = model.generate(
        input_ids=input_ids,
        max_new_tokens=64,
        do_sample=False,
        repetition_penalty=1.1,
        pad_token_id=tokenizer.eos_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )

print(tokenizer.decode(output[0], skip_special_tokens=True))

Suggested Generation Settings

For stable continuations:

do_sample=False
repetition_penalty=1.1
max_new_tokens=64 to 128

For more varied text:

do_sample=True
temperature=0.6
top_p=0.9
repetition_penalty=1.1
max_new_tokens=64 to 128

License

Apache 2.0

Downloads last month: 12

Safetensors

Model size

8.88M params

Tensor type

F32

BananaMind
/

MiniBananaMind-v3-9M