How to use from
Docker Model Runner
docker model run hf.co/AMFORGE/samg
Quick Links

SAM-G

SAM-G is a 30.3M-parameter dual-mode language model for offline structured action generation. Given a natural-language instruction it emits compact, schema-valid JSON for ten domains; given a question it emits free text. Mode selection is learned, not prompted. Built by AMEFORGE for robotics, IoT and embedded deployment where hosted-LLM APIs are too costly, too slow, or unavailable.

  • Parameters: 30.3M · Footprint: 121 MB fp32 (~30 MB int8)
  • Context: 1024 tokens · Languages: English, French (actions)
  • Throughput: ~235 tok/s, 16 ms first-token (single GPU); runs on a Raspberry-Pi-class CPU
  • Released: model weights + inference tokenizer. Training pipeline, data generators and architecture are proprietary.

Two modes

Input Model emits
turn on the kitchen lamp [ACTION] {"domain":"home","op":"set_state","params":{"device":"lamp","name":"kitchen","state":"on"}}
what is a mutex [CHAT] A mutex is a lock that allows one thread at a time.

Domains: ros, http, mqtt, db, workflow, ecommerce, vehicle, home, cal, file.

Benchmark

SAM-G is evaluated zero-shot in its native format; baselines run 3-shot through their chat template with a system instruction. bpb is tokenizer-fair (per-token perplexity is not comparable across vocabularies). exact/M = action exact-match per million parameters — the efficiency axis.

Model Params bpb ↓ JSON valid % Exact % Exact FR % Cloze % MB tok/s exact/M ↑
SAM-G 30.3M 1.179 100 76 77 83 121 235 2.51
Pythia-70M 70M 1.674 2 0 0 75 141 120 0.00
Qwen2.5-0.5B-Instruct 494M 0.814 99 25 7 96 988 27 0.05
SmolLM2-360M-Instruct 362M 0.812 96 14 0 96 724 21 0.04
Qwen2.5-1.5B-Instruct 889M 0.753 98 21 0 96 444* 13 0.02

*Qwen2.5-1.5B loaded in 4-bit. Larger general models lead on bits-per-byte and cloze (they are 12–30× bigger and trained for general knowledge); SAM-G leads decisively on structured action, French actions, footprint, speed, and exact-match per parameter. Notably Qwen2.5-1.5B scores below Qwen2.5-0.5B on action exact-match — capability here comes from domain specialization, not scale.

Per-domain exact match (%)

ros http mqtt db workflow ecommerce vehicle home cal file
0 100 100 100 60 100 100 50 80 60

All general baselines score 0 on most domains, succeeding only partially on the most generic ones (home, cal). ros (floating-point fields) is SAM-G's weakest schema and benefits most from additional training data.

Usage

import sentencepiece as spm, torch
# Load the released inference tokenizer (samg_tokenizer.model) and weights.
sp = spm.SentencePieceProcessor(); sp.Load("samg_tokenizer.model")

prompt = "publish 21.5 on sensors/temp qos 1 [ACTION]"
ids = torch.tensor([sp.EncodeAsIds(prompt)])
# greedy-decode with your loaded model until EOS, then sp.DecodeIds(...)
# -> {"domain":"mqtt","op":"publish","params":{"topic":"sensors/temp","payload":21.5,"qos":1}}

Always parse output as JSON and validate against your schema before execution.

Intended use

On-device home automation; NL→ROS robot command layers; MQTT fleet gateways; offline vehicle commands; NL-to-SQL on embedded databases; workflow triggers; and the structured tool-calling stage of agentic pipelines — as a drop-in replacement or a fast router ahead of a larger hosted model.

Limitations

  • Not a general assistant: factual knowledge and open-ended reasoning are limited at this scale; larger general models lead on bits-per-byte and cloze.
  • French covers actions, not extended prose.
  • Schemas outside the ten domains need fine-tuning. The ros schema (floating-point fields) is the weakest and benefits most from more data.
  • The action benchmark is synthetic, drawn from the training distribution family with a disjoint evaluation seed (999).

Citation

@misc{samg2026,
  title  = {SAM-G: A 30M-Parameter Dual-Mode Language Model for Offline Structured Action Generation},
  author = {AMEFORGE Lab},
  year   = {2026}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results