prism-coder:4b — Prism Memory Tool Router

Prompt-engineered Qwen3.5-4B for MCP tool routing in the Prism Coder system. No fine-tuning — the system prompt IS the specialization.

Downloads

File Quantization Size BFCL Accuracy Use when
Qwen3.5-4B-Q3_K_M.gguf Q3_K_M 2.3 GB 99.1% × 3 seeds iPhone / mobile first gate
(stock via Ollama) Q4_K_M 3.4 GB 100% × 3 seeds Mac / 8 GB+ devices

Quick Start

# iPhone-optimized (2.3 GB, 99.1%)
ollama pull dcostenco/prism-coder:2b

# Full quality (3.4 GB, 100%)
ollama pull dcostenco/prism-coder:4b

BFCL Benchmark

Q3_K_M (prism-coder:2b) — 99.1% × 3 seeds

114/115 × 3 shuffled runs = 99.1%, 1 flaky case

Category Count Accuracy
save 17 100%
smem 17 100%
aac 12 100%
hand 12 100%
irrel 10 90%
load 9 100%
pred 8 100%
know 7 100%
cmpct 6 100%
edge 6 100%
tran 6 100%
info 5 100%

Single failure: "Write a regex to match email addresses" → knowledge_search instead of plain.

Q4_K_M (prism-coder:4b) — 100% × 3 seeds

115/115 × 3 shuffled runs = 100.0%, 0 flaky

Architecture

Qwen3.5-4B uses a hybrid attention architecture:

  • 24 linear attention layers (Gated DeltaNet) — O(n) inference
  • 8 full attention layers (standard softmax) — precise retrieval

This hybrid design is why prompt-only routing works at 4B scale but not smaller. The 8 full-attention layers are sufficient to hold the routing rules when combined with the DeltaNet layers' pattern matching.

Fleet Position

Model Ollama tag Size BFCL Role
Qwen3.5-4B Q3_K_M dcostenco/prism-coder:2b 2.3 GB 99.1% iPhone / mobile
Qwen3.5-4B Q4_K_M dcostenco/prism-coder:4b 3.4 GB 100% Verifier / 8 GB+
Qwen3.5-9B Q4_K_M dcostenco/prism-coder:9b 5.8 GB 100% Default router
prism-coder:32b dcostenco/prism-coder:32b 19 GB 100% Complex tasks

Links

Downloads last month
225
GGUF
Model size
4B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

3-bit

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dcostenco/prism-coder-4b

Finetuned
Qwen/Qwen3.5-4B
Quantized
(256)
this model