Gros-Michel-90m-Base is a 90m parameter billingual LLM trained on 4.5 billion tokens of a custom dataset mixture, then further enhanced with a 2 billion token continued pretraining run. The goal with this model is to provide a flexible base for further finetuning on downstream tasks, such as translation, sentiment analysis and extraction.

Gros-Michel-90m-Base uses a tokenizer trained on both english and german data, with a vocab size of 20000.

Pretrain Data mixture

Dataset	Weight
`HuggingFaceFW/fineweb-edu`	38%
`epfml/FineWeb-HQ`	18%
`HuggingFaceTB/cosmopedia` (stories split)	18%
`HuggingFaceTB/finemath` (finemath-4plus)	6%
`finnianx/de_corpus`	20%

Continued pretrain Data mixture

Dataset	Weight
`"wikimedia/wikipedia", "20231101.en"`	40%
`"wikimedia/wikipedia", "20231101.de"`	40%
`HuggingFaceTB/finemath` (finemath-4plus)	20%

Comparison to other models

Maker	Model	Hellaswag	ARC (easy)	PIQA	BLiMP	Average
finnianx	Gros-Michel-90M	30.26%	41.50%	59.41%	78.35%	52.38%
finnianx	Michel-Nano-v2	27.40%	35.90%	56.75%	72.52%	48.14%
Axiomic Labs	GPT-S-5M	27.39%	33.16%	57.13%	72.21%	47.47%
EleutherAI	pythia-31m	27.14%	33.88%	56.26%	67.78%	46.27%
MaliosDark	Isabel-50M	27.1%	43.81%	57.12%	73.75%	50.44%

German Benchmarks

Model	arc_de acc	arc_de acc_norm	hellaswag_de acc	hellaswag_de acc_norm	m_mmlu_de acc	truthfulqa_de_mc1 acc	truthfulqa_de_mc2 acc
Gros-Michel-90M-Base	0.1865	0.2284	0.2697	0.2852	0.2346	0.2348	0.4285
nanochat German v1	0.2241	0.2626	0.3203	0.3581	0.2285	0.2500	0.4184
LLäMmlein-120M	0.1942	0.2301	0.2945	0.3178	0.2285	0.2310	0.4055
LLäMmlein-1B	0.2515	0.2960	0.3703	0.4490	0.2317	0.2322	0.3617

Notice

This model has not undergone any alignment, and therefore may produce harmful content.

Evaluation was done in lm-eval-harness by EleutherAI, all benchmark scores use normalized accuracy where applicable and are zero-shot.

Future plans

Sometime in the near(ish) future i will release an instruction tuned variant of this model, along with a translation focused finetune. GGUF support will also come in the near(ish) future.

Downloads last month: 239

Safetensors

Model size

91.1M params

Tensor type

F16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train finnianx/Gros-Michel-90m-Base

Space using finnianx/Gros-Michel-90m-Base 1

Collection including finnianx/Gros-Michel-90m-Base

Gros Michel

Collection

All Gros Michel models (bilingual, trained on English and German data) • 1 item • Updated 3 days ago