Ornith-1.0-35B โ€” APEX GGUF

APEX (Adaptive Precision for EXpert Models) quantizations of Ornith-1.0-35B, an open-source coding MoE model by DeepReinforce (MIT license, based on Qwen 3.5 architecture).

These quants were produced using the apex-quant toolchain. APEX is a MoE-aware mixed-precision quantization strategy that classifies tensors by role (routed expert, shared expert, attention) and applies a layer-wise precision gradient โ€” edge layers get higher precision, middle layers more aggressive compression.

Files

Each profile comes in two variants:

  • Base โ€” the quantized model standalone
  • -MTP โ€” includes the bundled MTP (multi-token prediction) head, quantized to Q8_0 (near-lossless), for self-speculative decoding via --spec-type draft-mtp. Requires a recent llama.cpp build with MTP support.

I-variants were calibrated with a diverse importance matrix (chat, code, reasoning, tool-calling, multilingual) for improved downstream accuracy.

File Profile Size Best For
ornith-1.0-35b-APEX-I-Mini.gguf I-Mini 14 GB Smallest viable, fastest inference
ornith-1.0-35b-APEX-I-Mini-MTP.gguf I-Mini + MTP 14 GB Smallest viable + self-spec
ornith-1.0-35b-APEX-Compact.gguf Compact 17 GB Consumer GPUs, general purpose
ornith-1.0-35b-APEX-Compact-MTP.gguf Compact + MTP 17 GB Consumer GPUs + self-spec
ornith-1.0-35b-APEX-I-Compact.gguf ๐Ÿ† I-Compact 17 GB Consumer GPUs, best quality at this size
ornith-1.0-35b-APEX-I-Compact-MTP.gguf ๐Ÿ† I-Compact + MTP 17 GB Consumer GPUs, best quality + self-spec
ornith-1.0-35b-APEX-Quality.gguf Quality 22 GB Highest quality standard
ornith-1.0-35b-APEX-Quality-MTP.gguf Quality + MTP 23 GB Highest quality + self-spec
ornith-1.0-35b-APEX-I-Quality.gguf I-Quality 22 GB Highest quality with imatrix
ornith-1.0-35b-APEX-I-Quality-MTP.gguf I-Quality + MTP 23 GB Highest quality + imatrix + self-spec
ornith-1.0-35b-APEX-Balanced.gguf Balanced 24 GB General purpose, best trade-off
ornith-1.0-35b-APEX-Balanced-MTP.gguf Balanced + MTP 25 GB General purpose + self-spec
ornith-1.0-35b-APEX-I-Balanced.gguf ๐Ÿ† I-Balanced 24 GB Best overall โ€” lowest KL divergence
ornith-1.0-35b-APEX-I-Balanced-MTP.gguf ๐Ÿ† I-Balanced + MTP 25 GB Best overall + self-spec

Profile Precision Breakdown

APEX applies a layer-wise precision gradient to MoE expert weights. I-variants additionally use a diverse imatrix (chat, code, reasoning, tool-calling) that improves downstream accuracy and lowers KL divergence.

Profile Edge (blk 0-4, 35-39) Near-Edge (blk 5-9, 30-34) Middle (blk 10-29) Shared Expert Attention
Quality Q6_K Q5_K IQ4_XS Q8_0 Q6_K
Balanced Q6_K Q5_K Q5_K Q8_0 Q6_K
Compact Q4_K Q3_K Q3_K Q6_K Q4_K
Mini Q3_K_M Q3_K_M IQ2_S Q4_K Q3_K_M

Quality and Mini use a 3-tier gradient. Balanced and Compact use a simpler 2-tier gradient (edge vs. middle) โ€” their "Near-Edge" and "Middle" columns are the same precision.

MTP Head

The bundled MTP head (blk.40.* including the nextn.* projection + norms) is quantized to Q8_0 (near-lossless) for high draft accuracy. Enable with:

llama-server -m ornith-1.0-35b-APEX-...-MTP.gguf --spec-type draft-mtp

Usage Examples

llama.cpp server (basic)

llama-server \
  -m ornith-1.0-35b-APEX-I-Compact.gguf \
  -ngl 99 \
  -c 32768 \
  --flash-attn on \
  --temp 0.6 \
  --top-p 0.95

With self-speculative decoding (MTP variants)

llama-server \
  -m ornith-1.0-35b-APEX-I-Compact-MTP.gguf \
  --spec-type draft-mtp \
  -ngl 99 \
  -c 32768 \
  --flash-attn on

llama.cpp server with vision

Ornith has a built-in vision encoder. Vision support in GGUF format is experimental โ€” if a compatible mmproj becomes available, pass it with --mmproj.

Hardware Notes

Profile Minimum VRAM Recommended VRAM
I-Mini 16 GB 24 GB
Compact / I-Compact 20 GB 24 GB
Quality / I-Quality 24 GB 32 GB
Balanced / I-Balanced 24 GB (tight) 32 GB+

Acknowledgements

Downloads last month
-
GGUF
Model size
35B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Tribbler/ornith-1.0-apex

Quantized
(22)
this model