Zen5

Canonical default of the Zen5 family. Multimodal sparse MoE (image + text in → text out) with 35B total / 3B active parameters per token, 256K context. The everyday Zen5 model — agentic-trained, fast at scale, frontier-quality vision-language reasoning at a 3B-active compute budget.

Part of the canonical Zen5 ladder:

SKU Hardware fit This repo
zen5-flash anything (4 GB VRAM) zen-5-flash-gguf
zen5-mini 32 GB zen-5-mini-gguf
zen5 (default) 24 GB+ VRAM (Q4_K) ← you are here
zen5-pro Mac M4 Max / DGX Spark / H100 80GB zen-5-pro-gguf
zen5-max Mac Studio M3 Ultra 512GB / 8x H100 zen-5-max-gguf

Files

File Format
main GGUF (*-Q4_K.gguf) GGUF Q4_K (text + vision), refusal-orthogonalized
mmproj-model-f16.gguf multimodal vision projector — load alongside the main GGUF for image input

Run

Hosted via the Hanzo gateway (api.hanzo.ai) as zen5.

Local with llama.cpp (CLI / server) or zen5-engine:

hf download zenlm/zen-5-gguf --local-dir gguf
MAIN=$(ls gguf/*-Q4_K.gguf | head -1)

# text-only chat
llama-cli -m "$MAIN" -p "Explain MoE inference."

# vision-language (image input)
llama-cli -m "$MAIN" \
          --mmproj gguf/mmproj-model-f16.gguf \
          --image path/to/screenshot.png \
          -p "Describe this UI and propose a fix."
Downloads last month
484
GGUF
Model size
36B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including zenlm/zen-5-gguf