Atomic Chat Discord GitHub

Ornith 1.0 9B

Ornith 1.0 9B, quantized to MLX 5-bit by Atomic Chat for Apple Silicon. Built straight from DeepReinforce's original weights. Runs fully offline on your Mac.

Highlights

  • A self-improving open-source family for agentic coding from DeepReinforce, built for tool-calling and terminal-based coding agents.
  • Post-trained on top of Gemma 4 and Qwen 3.5, the smallest, fastest member of the Ornith 1.0 lineup.
  • Strong agentic coding scores for its size: 69.4 on SWE-bench Verified and 43.1 on Terminal-Bench 2.1 (Terminus-2).
  • Dense architecture, 32 layers, qwen3_5 model type with a hidden_size of 4096.
  • 262,144-token native context for long files and multi-step agent traces.
  • Pure open: MIT licensed, globally accessible with no regional limits.
  • Full quant ladder with an importance matrix on every quant over calibration_datav3.

This is the MLX 5-bit build for Apple Silicon (M-series). For llama.cpp/Ollama/CPU use the GGUF repo.

Model Overview

Property Value
Base model deepreinforce-ai/Ornith-1.0-9B
Total parameters ~9B (model name; card states no exact figure in prose)
Layers 32
Context length 262,144
Architecture qwen3_5 dense causal LM, post-trained on Gemma 4 and Qwen 3.5
This repo MLX 5-bit quant for Apple Silicon (~6.2 GB), built from the original weights.
Ornith 1.0 9B benchmarks

Scores are DeepReinforce's published results for the full-precision base deepreinforce-ai/Ornith-1.0-9B. MLX quants run the same model locally; lower bit-widths trade a little accuracy for size/speed.

MLX quants in this series

4-bit · 5-bit ← this · 6-bit · 8-bit

Run on Apple Silicon

pip install mlx-lm
mlx_lm.generate --model AtomicChat/ornith-9b-MLX-5bit --prompt "Write a quicksort in Python" --max-tokens 512
from mlx_lm import load, generate
model, tokenizer = load("AtomicChat/ornith-9b-MLX-5bit")
msg = [{"role": "user", "content": "Write a quicksort in Python"}]
prompt = tokenizer.apply_chat_template(msg, add_generation_prompt=True)
print(generate(model, tokenizer, prompt=prompt, max_tokens=512, verbose=True))

Or open it in Atomic Chat: search AtomicChat/ornith-9b-MLX-5bit and hit Use this model.

Recommended sampling

Parameter Value
temperature 0.6
top_p 0.95
top_k 20

DeepReinforce's recommended sampling parameters. The card notes that temperature=1.0 reproduces the reported benchmark setup.

How this was made

  1. Download deepreinforce-ai/Ornith-1.0-9B (original weights).
  2. Convert + quantize to MLX with mlx_lm.convert -q --q-bits 5 --q-group-size 64.

License

Released by DeepReinforce under the MIT license, globally accessible with no regional limits. Quantized to MLX by Atomic Chat.

Downloads last month
-
Safetensors
Model size
9B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

5-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AtomicChat/ornith-9b-MLX-5bit

Quantized
(14)
this model