Zen5 Chat Ladder
Collection
Canonical Zen5 lineup, smallest to largest. • 6 items • Updated
Code-specialized member of the Zen5 family. 80B-parameter sparse MoE tuned for repo-scale code understanding, agentic refactoring, and tool-use coding loops.
Part of the canonical Zen5 ladder:
| SKU | Hardware fit | This repo |
|---|---|---|
zen5-flash |
anything (4 GB VRAM) | zen-5-flash-gguf |
zen5-mini |
32 GB | zen-5-mini-gguf |
zen5 (default) |
24 GB+ VRAM (Q4_K) | zen-5-gguf |
zen5-coder |
48 GB+ VRAM (Q4_K_M) | ← you are here |
zen5-pro |
Mac M4 Max / DGX Spark / H100 80GB | zen-5-pro-gguf |
zen5-max |
Mac Studio M3 Ultra 512GB / 8x H100 | zen-5-max-gguf |
A first-party zenlm GGUF mirror is staged for this repo. Until it lands, the recommended path is to use the hosted zen5-coder endpoint (see below) or pull a community 80B-class coder GGUF Q4_K_M into a local gguf/ directory.
Hosted via the Hanzo gateway (api.hanzo.ai) as zen5-coder — preferred until the first-party GGUF mirror lands.
Local with llama.cpp or a compatible runtime, once you have a GGUF in gguf/:
MAIN=$(ls gguf/*Q4_K_M*.gguf | head -1)
llama-cli -m "$MAIN" -p "Refactor this Python function to use async/await."