Atomic Chat Join Discord GitHub

Ornith 1.0 35B

Ornith 1.0 35B, self-quantized to GGUF by Atomic Chat. Built straight from DeepReinforce's original weights with a per-tensor importance matrix. Runs fully offline.

Highlights

  • A self-improving open-source family for agentic coding from DeepReinforce, built for tool-calling and terminal-based coding agents.
  • Sparse Mixture-of-Experts: 256 routed experts with 8 active per token plus a shared expert, across 40 layers (qwen3_5_moe).
  • Post-trained on top of Gemma 4 and Qwen 3.5, the mid-size member of the Ornith 1.0 lineup.
  • Strong agentic coding scores: 75.6 on SWE-bench Verified and 64.2 on Terminal-Bench 2.1 (Terminus-2).
  • 262,144-token native context for long files and multi-step agent traces.
  • Pure open: MIT licensed, globally accessible with no regional limits.
  • Full quant ladder with an importance matrix on every quant over calibration_datav3.

These GGUFs are self-quantized from the original weights, not a repack. The importance matrix keeps low-bit quants closer to the full-precision model.

Always pass --jinja so the Ornith 1.0 35B chat template is applied. Without it the model can emit malformed turns.

Model Overview

Property Value
Base model deepreinforce-ai/Ornith-1.0-35B
Total parameters ~35B total (MoE; model name). Active per token not stated
Layers 40
Experts 256 routed + 1 shared, 8 active per token
Context length 262,144
Architecture qwen3_5_moe sparse MoE, post-trained on Gemma 4 and Qwen 3.5
This repo GGUF quants (imatrix), full ladder from the original weights
Ornith 1.0 35B benchmark scores

Scores are DeepReinforce's published results for the base deepreinforce-ai/Ornith-1.0-35B. These are full-precision scores; the quants here run the same model locally. Quantization preserves the large majority of this, with Q4_K_M and up sitting within a point or two of full precision.

Choosing a quant

Quant Size Notes
Q4_K_M 21.2 GB Recommended default. Best balance of size, speed and quality.
UD-Q4_K_XL 21.5 GB Dynamic. Token embeddings and output kept at Q8_0 for higher quality at a Q4 footprint.
Q5_K_M 24.7 GB Higher quality, low loss.
Q6_K 28.5 GB Near lossless.
Q8_0 36.9 GB Effectively lossless, reference quality.

Pick the largest file that fits your (V)RAM with room for context. Q4_K_M or UD-Q4_K_XL is the sweet spot for most setups; Q6_K or Q8_0 for maximum fidelity. As an MoE the routed experts dominate the file size, so the quants are smaller than a dense 35B.

Get started

Run Ornith 1.0 35B locally with:

  • Atomic Chat: the easiest path. Open the app, search AtomicChat/ornith-35b-GGUF, pick a quant, hit Use this model.
  • llama.cpp: llama-server -hf AtomicChat/ornith-35b-GGUF:Q4_K_M --jinja -c 8192
  • Ollama: ollama run hf.co/AtomicChat/ornith-35b-GGUF:Q4_K_M
  • LM Studio / Jan: search the repo id, download any quant.

Best practices

Parameter Value
temperature 0.6
top_p 0.95
top_k 20

DeepReinforce's recommended sampling parameters. The card notes that temperature=1.0 reproduces the reported benchmark setup.

Run in llama.cpp

git clone https://github.com/ggerganov/llama.cpp
cmake llama.cpp -B llama.cpp/build -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON
cmake --build llama.cpp/build --config Release -j --target llama-cli llama-server
./llama.cpp/build/bin/llama-server \
    -hf AtomicChat/ornith-35b-GGUF:UD-Q4_K_XL \
    --jinja -ngl 99 -c 8192 -fa on

How these were made

  1. Download deepreinforce-ai/Ornith-1.0-35B (original weights).
  2. Convert to f16 GGUF with llama.cpp; the NextN/MTP block is stripped before quantizing.
  3. Build an importance matrix over calibration_datav3 with llama-imatrix.
  4. Quantize the full ladder with --imatrix.
  5. UD-Q4_K_XL additionally pins the token-embedding and output tensors to Q8_0.

License

Released by DeepReinforce under the MIT license, globally accessible with no regional limits. Quantized by Atomic Chat.

Downloads last month
41
GGUF
Model size
35B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AtomicChat/ornith-35b-GGUF

Quantized
(27)
this model

Collection including AtomicChat/ornith-35b-GGUF