Mistral-Mamba3-7B (Alpha)

Cross-architecture Subsuminator heist: mistralai/Mistral-7B-Instruct-v0.3Mamba3-7B SSM body.

CE gate (measured)

Metric Value
Protocol subsuminator/subsume.py CE gate (random tokens, shifted CE)
Source CE 13.3000
Heisted CE 10.4125
log(vocab) baseline 10.3972
CE ratio vs source 0.7829×

Random-token next-step CE on mistralai/Mistral-7B-Instruct-v0.3 vs this checkpoint (seed=42, n=5, seq_len=32).

Interpretation: Heisted CE ≈ log(vocab) (10.41 vs 10.40) — the fresh SSM body behaves like a random baseline on uniform tokens. Ratio 0.7829× vs source (10.41 / 13.30) is below 1.0 because Mistral's trained transformer raises CE on garbage random inputs; this is not capability preservation. For comparison, Mamba2→Mamba3 structural port (trained→trained) achieved 1.0016× with CE near source.

What transferred

  • Token embeddings, final norm, lm_head, per-layer input norms from Mistral-7B

What did not (fresh orthogonal init)

  • All Mamba3 SSM mixer weights (in_proj, out_proj, dt_bias, state dynamics)

Alpha checkpoint — fine-tune before production use.

Run it (Avocado)

Sovereign local inference for mamba2 + mamba3 only:

  • rideitlikeyoustoleit — static Avocado binaries (--yeehaw, --arnie, --giddyup)
  • Download a release build, point at this checkpoint, trust_remote_code for HF load or Avocado native splat path
./avocado run --model RtaForge/Mistral-Mamba3-7B --prompt "Come with me if you want to live."

Usage (trust_remote_code required)

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("RtaForge/Mistral-Mamba3-7B", trust_remote_code=True)
tok = AutoTokenizer.from_pretrained("RtaForge/Mistral-Mamba3-7B")
Downloads last month
15
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for RtaForge/Mistral-Mamba3-7B

Finetuned
(500)
this model