Mistral-Mamba3-7B (Alpha)
Cross-architecture Subsuminator heist: mistralai/Mistral-7B-Instruct-v0.3 → Mamba3-7B SSM body.
CE gate (measured)
| Metric | Value |
|---|---|
| Protocol | subsuminator/subsume.py CE gate (random tokens, shifted CE) |
| Source CE | 13.3000 |
| Heisted CE | 10.4125 |
| log(vocab) baseline | 10.3972 |
| CE ratio vs source | 0.7829× |
Random-token next-step CE on mistralai/Mistral-7B-Instruct-v0.3 vs this checkpoint (seed=42, n=5, seq_len=32).
Interpretation: Heisted CE ≈ log(vocab) (10.41 vs 10.40) — the fresh SSM body behaves like a random baseline on uniform tokens. Ratio 0.7829× vs source (10.41 / 13.30) is below 1.0 because Mistral's trained transformer raises CE on garbage random inputs; this is not capability preservation. For comparison, Mamba2→Mamba3 structural port (trained→trained) achieved 1.0016× with CE near source.
What transferred
- Token embeddings, final norm, lm_head, per-layer input norms from Mistral-7B
What did not (fresh orthogonal init)
- All Mamba3 SSM mixer weights (
in_proj,out_proj,dt_bias, state dynamics)
Alpha checkpoint — fine-tune before production use.
Run it (Avocado)
Sovereign local inference for mamba2 + mamba3 only:
- rideitlikeyoustoleit — static Avocado binaries (
--yeehaw,--arnie,--giddyup) - Download a release build, point at this checkpoint,
trust_remote_codefor HF load or Avocado native splat path
./avocado run --model RtaForge/Mistral-Mamba3-7B --prompt "Come with me if you want to live."
Usage (trust_remote_code required)
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("RtaForge/Mistral-Mamba3-7B", trust_remote_code=True)
tok = AutoTokenizer.from_pretrained("RtaForge/Mistral-Mamba3-7B")
- Downloads last month
- 15
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for RtaForge/Mistral-Mamba3-7B
Base model
mistralai/Mistral-7B-v0.3 Finetuned
mistralai/Mistral-7B-Instruct-v0.3