Daimon-R πŸ€–

Daimon-R

The flagship local brain of Daimon β€” reasoning + coding. Runs on a GPU via llama.cpp.

Daimon-R is the flagship brain of Daimon β€” a local-first personal AI assistant. Daimon-R = Qwen/Qwen3-4B-Instruct-2507 (quantized GGUF) + a Daimon LoRA, served locally with llama.cpp. The weights are open β€” use them standalone today.

What the Daimon LoRA adds

The LoRA tunes persona and behavior, not raw coding (that stays at the base's level):

  • A consistent Daimon identity (first person, local-first assistant) instead of a generic "language model" voice.
  • Canvas convention: when you ask it to build a web/app/UI it replies with a single self-contained html block that renders live β€” no preamble, no copy-paste.
  • Concise Rioplatense Spanish for chat, plus a voice-friendly register for TTS.

Benchmarks (real, measured locally)

Daimon uses this model to pick JSON tool-call actions for its browser co-pilot, computer-use, and code-editing agents (e.g. {"action":"click","ref":3}). Measured on 198 held-out synthetic scenarios β€” disjoint vocabulary/phrasing from training, zero string-level overlap with the training set, graded objectively (not by text similarity):

Metric Base Qwen3-4B + identity-only LoRA Daimon-R
Valid JSON 97.5% 100.0%
Correct action chosen 65.8% 94.4%
Correct action + correct fields (ref/path/command/...) 39.2% 94.4%

Persona/behavior suite (identity, canvas-HTML convention, Rioplatense style, no-refusal): 12/12, no regression from the identity-only checkpoint.

We also tried fine-tuning on Lucas's own commit history to reproduce his exact diffs (3 attempts, different data/hyperparameter fixes each time) β€” it never beat the base model's overlap with held-out commits, so that LoRA was never shipped. We only publish results that actually win a real, pre-registered comparison.

Run standalone

huggingface-cli download lucas-mella/Daimon-R --local-dir ./daimon-models
# llama.cpp: load the base GGUF and apply the Daimon LoRA
llama-server -m ./daimon-models/Qwen3-4B-Instruct-2507-Q4_K_M.gguf \
  --lora ./daimon-models/daimon-r-lora-f16.gguf -c 8192

Exposes an OpenAI-compatible endpoint (default http://localhost:8080/v1).

Files

  • Qwen3-4B-Instruct-2507-Q4_K_M.gguf
  • daimon-r-lora-f16.gguf

Daimon β€” coming soon

These weights are the brain of Daimon, a local-first personal assistant that runs on your own machine: real-time local voice, a co-pilot browser that Daimon and you share, a canvas for apps & prototypes, and hybrid local/cloud routing. The full app isn't public yet β€” these open models are a complement you can already build on. Watch @lucas-mella for the release.

Downloads last month
36
GGUF
Model size
33M params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

4-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for lucas-mella/Daimon-R

Adapter
(5585)
this model