mind-mem-4b v2 (paired with mind-mem)
A governance-aware memory-assistant model for mind-mem — an auditable, contradiction-safe memory layer for coding agents (MCP-compatible).
v2 supersedes v1. This checkpoint is a full fine-tune of the v1 star-ga/mind-mem-4b checkpoint (itself QLoRA-merged from Qwen/Qwen3.5-4B), retrained on the v3.3.0 MCP surface: the 7 consolidated dispatcher tool-calls, query decomposition / reformulation, entity extraction, and evidence-bundle consumption.
Pair with mind-mem
This checkpoint is the recommended extractor / utility model for mind-mem on PyPI. mind-mem provides:
- Governance rationale chain — every
propose_update(decisions),reject_proposal(new tool), androllback_proposalnow requires a written reason (≥ 8 non-whitespace chars). The rationale is committed inside the proposal's Markdown block as a timestamped audit record. - 58 MCP tools (
reject_proposaladded to the ACL-gated surface). - LoCoMo bench harness routes any model name prefixed with
mind-memthrough your local Ollama endpoint so this checkpoint can act as the utility model for query rewrites, two-stage extraction, and consensus voting — fully offline. - LoCoMo conv-0 baseline 86.33 (Mistral-Large answerer + judge), +8.4 over the v1.9.0 published 77.9. Ahead of all publicly benchmarked competitors except Mem0's 2026 managed platform (91.6). A LoCoMo-specific fine-tune of this model is tracked for v3.7.
Install / upgrade:
pip install -U mind-mem
ollama pull mind-mem:4b # pulls this checkpoint
What's new in v2
| v1 (2026-02-14) | v2 (2026-04-21) | |
|---|---|---|
| Method | QLoRA rank-16 adapters, 10GB RTX 3080 | Full fine-tune, all 4.2B params trained |
| Hardware | RTX 3080 10GB | H200 NVL 141GB |
| MCP surface | 19 v2.10.0 tools | 7 v3.2.x dispatchers + v3.3.0 retrieval |
| Optimizer | Paged AdamW 8-bit | AdamW (fused) bf16 |
| Learning rate | 2e-4 | 5e-6 (gentler for full FT) |
| Epochs | 3 | 3 |
| Training examples | ~40K | 16,450 (curated) |
| Sequence length | 768 | 384 (matches p99 of corpus) |
| Gradient checkpointing | true | true |
| Effective batch size | 16 | 32 (batch 4 × accum 8) |
What it knows about (v3.3.0)
- 7 MCP dispatchers (consolidated from 57 tools):
recall(mode="similar|verify|intent|diagnose|bundle")staged_change(phase="propose|approve|rollback")graph(action="expand|prefetch|contradict")memory_verify,core,kernels,compiled_truth
- Query decomposition — splits multi-hop questions into sub-queries (multi-hop / temporal / causal patterns).
- Query reformulation — paraphrase rewrites for robustness.
- Entity extraction —
PER-/PRJ-/TOOL-/INC-prefixes and capitalised proper nouns. - Evidence bundle format — structured JSON with facts / relations / timeline / entities (
recall(format="bundle")). - 7 typed graph edges — cites, derived_from, depends_on, tested_by, supersedes, contradicts, relates_to.
Usage
Load bf16 (native)
from transformers import AutoModelForCausalLM, AutoTokenizer
MODEL = "star-ga/mind-mem-4b"
tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = AutoModelForCausalLM.from_pretrained(MODEL, torch_dtype="bfloat16", device_map="auto")
messages = [
{"role": "system", "content": "You are mind-mem-4b, the local LLM that powers mind-mem's retrieval and governance surfaces. Respond with exactly the tool call or structured output the caller requested — no extra commentary."},
{"role": "user", "content": "What did Alice say about the OAuth migration?"},
]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(inputs, max_new_tokens=128, do_sample=False)
print(tokenizer.decode(out[0][inputs.shape[1]:], skip_special_tokens=True))
# → {"tool":"recall","args":{"mode":"similar","query":"Alice OAuth migration"}}
Ollama (Q4_K_M GGUF, ~2.7GB)
ollama pull mind-mem:4b # v2 auto-replaces v1
ollama run mind-mem:4b "What did Alice say about OAuth?"
vLLM / exllamav2 (high-throughput)
vllm serve star-ga/mind-mem-4b --dtype bfloat16 --port 8000
curl -s -X POST http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{"model": "star-ga/mind-mem-4b", "prompt": "...", "max_tokens": 128}'
Training recipe
base_model: star-ga/mind-mem-4b # v1 checkpoint (not raw Qwen3.5-4B)
dtype: bfloat16
optim: adamw_torch_fused
learning_rate: 5.0e-6
lr_scheduler_type: cosine
warmup_ratio: 0.03
num_train_epochs: 3
per_device_train_batch_size: 4
gradient_accumulation_steps: 8 # effective batch 32
max_seq_length: 384 # p99 corpus length = 274 chars
packing: false
gradient_checkpointing: true
save_steps: 100
logging_steps: 10
data_collator: DataCollatorForLanguageModeling(mlm=False)
Corpus composition (16,450 examples)
| Source | Count | Shape |
|---|---|---|
benchmarks/generate_dispatcher_examples.py |
10,000 | (prompt, expected_call) pairs, covers all 7 dispatchers × modes |
benchmarks/generate_retrieval_examples.py |
5,000 | (task, prompt, expected_output) — query decomposition, reformulation, entity extraction |
v1 replay buffer (/data/checkpoints/mm-workspace/train-output/corpus.jsonl) |
1,450 | Legacy v2.10.0 MCP surface — keeps tool-name awareness |
All three generators are deterministic — generate_*_examples.py + the v1 corpus reproduce the 16,450-example JSONL byte-for-byte.
Loss curve
Published separately at docs/mind-mem-4b-v2-loss-curve.json. Key milestones:
| Step | Epoch | Loss | LR | Grad norm |
|---|---|---|---|---|
| 10 | 0.02 | 2.452 | 1.05e-06 | 86.0 |
| 50 | 0.11 | 0.3456 | 5.00e-06 | 7.97 |
| 100 | 0.21 | 0.1041 | 4.98e-06 | 4.56 |
| 200 | 0.43 | ~0.085 | 4.85e-06 | 1.9 |
| 300 | 0.64 | ~0.084 | 4.60e-06 | 1.6 |
| 1407 (final, epoch 3) | see commit message | see commit message | see commit message | see commit message |
Loss plateau at ~0.085 from step 150 is characteristic of the tool-call JSON output format (highly constrained vocabulary); further descent in epochs 2-3 reflects generalisation to unseen queries.
Hardware
- GPU: single H200 NVL (141 GB HBM3e, 4.8 TB/s memory bandwidth)
- Memory used: ~41 GB / 141 GB VRAM at bf16 full FT (batch 4, seq 384, grad-ckpt on)
- Throughput: ~10 s/step, ~6 hr total wall time for 1407 steps
- Cost: ~$20 at $3.39/hr on-demand (Runpod US-MD-1)
Eval
LoCoMo (benchmarks/locomo_judge.py)
v2 is evaluated on the same LoCoMo 10-conversation suite as v1. Publishing numbers separately once the full sweep completes — v2 is the answerer-side component; retrieval-side improvements from v3.3.0 features (graph_recall, entity_prefetch, rerank_ensemble, truth_score, answer_quality) are measured in the same run.
| Target | Baseline (v1.1.0) | Current (v3.2.1) | v3.3.0 projected |
|---|---|---|---|
| Mean | 70.54 | ~76.7 (Opus conv-0) | ≥ 82 |
| Temporal | — | — | — |
| Adversarial | — | — | — |
Tool-call accuracy
Replay 500 held-out dispatcher prompts:
| v1 | v2 | |
|---|---|---|
| Correct tool | 94% | pending |
| Correct mode | 89% | pending |
| Valid JSON | 98% | pending |
What NOT to retrain for
v2 is a capability-expansion + architecture-shift release (QLoRA → full FT), not a quality-lift release. Do not:
- Retrain on v3.3.0 synthetic data only — the 1,450-example replay buffer is essential to keep v2.10.0 tool-name awareness.
- Use v2 as a drop-in replacement without updating the mind-mem-json dispatcher map (v1 emits 19-tool calls, v2 emits 7-dispatcher calls).
- Quantise below Q4_K_M — tool-call JSON starts degrading at Q3.
Reproducing v2 training
git clone https://github.com/star-ga/mind-mem.git
cd mind-mem
# 1. Generate corpus
python3 benchmarks/generate_dispatcher_examples.py \
--output /tmp/mm-train/dispatchers.jsonl --count 10000
python3 benchmarks/generate_retrieval_examples.py \
--output /tmp/mm-train/retrieval.jsonl --count 5000
cat /data/checkpoints/mm-workspace/train-output/corpus.jsonl \
/tmp/mm-train/dispatchers.jsonl /tmp/mm-train/retrieval.jsonl \
| shuf --random-source=/dev/urandom > /tmp/mm-train/mixed.jsonl
# 2. Spin up Runpod H200 NVL (141GB), 100GB container + 100GB volume
# 3. On pod — one-shot kickoff
curl -sSL https://raw.githubusercontent.com/star-ga/mind-mem/main/benchmarks/runpod_kickoff.sh \
| HF_TOKEN=<write-scope> bash
Citation
@software{mind_mem_4b_v2_2026,
author = {STARGA Inc.},
title = {mind-mem-4b v2: Governance-Aware Memory Model for MCP Agents},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/star-ga/mind-mem-4b},
note = {Full fine-tune on 16,450 v3.3.0 MCP examples.}
}
Links
- GitHub: star-ga/mind-mem
- PyPI:
pip install mind-mem - v3.3.0 docs:
docs/in the repo — includes feature gate catalogue, protection model, Runpod kickoff. - Protection model: docs/protection.md — integrity manifests, strict-mode import, provenance chain.
- Downloads last month
- 2,391
Model tree for star-ga/mind-mem-4b
Unable to build the model tree, the base model loops to the model itself. Learn more.