Osaurus

Ornith-1.0-35B · MXFP8

Official OsaurusAI MXFP8 build of deepreinforce-ai/Ornith-1.0-35B (MIT) — a vision-language MoE on a Qwen3.5 hybrid backbone. Near-lossless 8-bit microscaled FP; runs on Apple Silicon via Osaurus / mlx.

  • ~34 GB bundle (down from ~70 GB bf16).
  • MXFP8: microscaled FP8 (group-size 32) on the language-model linear weights and routed experts; the vision tower is preserved at fp16, short-conv kernels and norms kept fp16.
  • Vision-language (image + text → text).

Architecture

Family qwen3_5_moe (hybrid)
Text layers 40 — 30 Gated-DeltaNet (linear-attention) + 10 full-attention
Experts 256 routed (stacked switch_mlp) · hidden 2048 · untied lm_head
Vision ViT tower (model.visual) preserved fp16
Cache hybrid (GDN state + KV for attention layers)

Usage

# text
python -m mlx_lm generate --model OsaurusAI/Ornith-1.0-35B-MXFP8 --prompt "Explain a hash map in two sentences."

For image+text, load in Osaurus or an MLX-VLM runtime that supports qwen3_5 vision.

Provenance

Downloads last month
-
Safetensors
Model size
10B params
Tensor type
U32
·
U8
·
F16
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OsaurusAI/Ornith-1.0-35B-MXFP8

Finetuned
(7)
this model