NeoToi Coder v3.1 โ 4B
A Rust / Dioxus 0.7 specialist fine-tuned from Qwen3-4B (4.0B parameters, 3.6B non-embedding, tied input/output embeddings) using RAFT (Retrieval-Augmented Fine-Tuning). Optimized for production-quality Dioxus 0.7 components with Tailwind v4 and WCAG 2.2 AAA accessibility.
This is the 4B variant of the v3.1 release โ about half the size, ~40% faster generation than the 8B variant (rockypod/neotoi-coder-8b), graded marginally lower on the spec exam. The legacy 14B is at rockypod/neotoi-coder and the family hub linking all three is on the same page.
Exam Results โ 104-question Dioxus 0.7 Spec Exam
Re-graded 2026-04-26 with the patched grader (run_grade_v31.py, accepts LANG()/THEME() GlobalSignal accessor calls on Q87).
| Tier | Name | Cnt | Raw | Wtd | /Max | Rate | Floor | Status |
|---|---|---|---|---|---|---|---|---|
| T1 | Fundamentals | 12 | 11 | 11.0 | 12.0 | 91.7% | 82% | โ |
| T2 | RSX Syntax | 12 | 12 | 12.0 | 12.0 | 100.0% | 82% | โ |
| T3 | Signal Hygiene | 12 | 12 | 12.0 | 12.0 | 100.0% | 82% | โ |
| T4 | WCAG / ARIA | 14 | 14 | 21.0 | 21.0 | 100.0% | 82% | โ |
| T5 | use_resource | 8 | 8 | 12.0 | 12.0 | 100.0% | 82% | โ |
| T6 | Hard Reasoning | 10 | 10 | 20.0 | 20.0 | 100.0% | 88% | โ |
| T7 | Primitives + CSS | 12 | 12 | 18.0 | 18.0 | 100.0% | 82% | โ |
| T8 | GlobalSignal / i18n | 8 | 8 | 12.0 | 12.0 | 100.0% | 82% | โ |
| T9 | Static Navigator | 6 | 6 | 9.0 | 9.0 | 100.0% | 82% | โ |
| T10 | Dioxus 0.7.4 | 6 | 6 | 12.0 | 12.0 | 100.0% | 88% | โ |
| T11 | Server Functions | 3 | 3 | 4.5 | 4.5 | 100.0% | 82% | โ |
| Overall | 102 | 143.5 | /144.5 | 99.31% | โ | โ PASS |
- Publication bar (90%): PASS
- Release bar (95%): PASS
- Tier floors: PASS
Single failure: Q8 (T1 RSX conversion) โ generation truncated mid-<think> block, so no answer body was produced. Real model failure, not a grader artifact.
Version History
| Version | Base (params) | Score | Exam | Dataset | Status |
|---|---|---|---|---|---|
| v1.0 | Qwen3-Coder-14B (14.8B) | 51/60 (85.0%) | 60Q standard | โ | Published |
| v2.0 | Qwen3-Coder-14B (14.8B) | 135.5/140 (96.8%) | 100Q weighted | 4,185 | Published |
| v3.0 | Qwen3-Coder-14B (14.8B) | 124.0/144.5 (85.8%) | 103Q weighted | 4,535 | Published |
| v3.1 | Qwen3-Coder-14B (14.8B) | 137.0/144.5 (94.81%) | 103Q weighted | 4,880 | Published |
| v3.1 | Qwen3-8B (8.2B) | 144.5/144.5 (100.00%) | 103Q weighted | 4,880 | Published |
| v3.1 | Qwen3-4B (4.0B) | 143.5/144.5 (99.31%) | 103Q weighted | 4,880 | This release |
Model Details
- Base model: Qwen/Qwen3-4B (4.0B parameters total, 3.6B non-embedding, tied input/output embeddings)
- Method: RAFT (Retrieval-Augmented Fine-Tuning) with LoRA adapters
- Dataset: 4,880 curated Dioxus 0.7 examples across 43 topics
- Scope: Rust + Dioxus 0.7 + Tailwind v4 + WCAG 2.2 AAA
- Quantization: Q4_K_M (~2.33 GB)
- Thinking tokens: patched (
qwen3.thinking = true) - Author: Kevin Miller, Jr.
Training
| Field | Value |
|---|---|
| Steps | 2,440 |
| Epochs | 4 |
| Wall time | ~1h 49m |
| Final train loss | 0.4724 |
| LoRA rank | 16 (alpha 32, dropout 0) |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Sequence length | 2048 |
| Precision | bf16 + 4-bit base |
| Hardware | RTX 3090 Ti (24 GB) |
Files
neotoi-coder-v3.1-4b-q4_k_m.ggufโ Q4_K_M quant (~2.33 GB)neotoi-coder-v3.1-4b-q4_k_m_patched.ggufโ same quant +qwen3.thinking=truepatch (recommended for Ollama / LM Studio)
Enabling Thinking Mode
This model emits Qwen3 native <think>...</think> blocks. Thinking is on by default with the patched GGUF on inference backends that honor qwen3.thinking.
Ollama
FROM neotoi-coder-v3.1-4b-q4_k_m_patched.gguf
PARAMETER temperature 0.2
PARAMETER num_predict 2000
PARAMETER num_ctx 8192
PARAMETER repeat_penalty 1.1
PARAMETER stop "<|im_end|>"
TEMPLATE """<|im_start|>system
{{ .System }}<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
<think>
"""
SYSTEM You are NeoToi, an expert Rust and Dioxus 0.7 developer specialized in Tailwind v4 and WCAG 2.2 AAA accessibility. Always think step-by-step before answering.
ollama create neotoi-coder:4b -f Modelfile
ollama run neotoi-coder:4b
LM Studio
| Field | Value |
|---|---|
| Before System | <|im_start|>system |
| After System | <|im_end|> |
| Before User | <|im_start|>user |
| After User | <|im_end|> |
| Before Assistant | <|im_start|>assistant\n<think> |
| After Assistant | <|im_end|> |
llama.cpp
./llama-cli \
-m neotoi-coder-v3.1-4b-q4_k_m_patched.gguf \
-ngl 99 \
--temp 0.2 \
-p "<|im_start|>user\nYour question here<|im_end|>\n<|im_start|>assistant\n<think>"
What It Knows
- Dioxus 0.7 RSX brace syntax โ never function-call style
use_signal,use_resourcewith correct three-arm matchr#foron label elements only, never inputs- WCAG 2.2 AAA:
aria_labelledby,aria_describedby,role="alert",role="dialog", live regions - dioxus-primitives โ no manual ARIA on managed components
styles!()macro for CSS modules- Tailwind v4 utility classes
GlobalSignalpatterns (LANG / THEME), i18n, dark-mode toggling- Dioxus 0.7.4 APIs:
WritableResultExt, WebSocket Stream+Sink, server-fn extractors
What It Does Not Know
- Playwright / E2E testing (out of scope)
- Non-Dioxus web frameworks
- Backends or databases beyond what server functions cover
- Occasional generation truncation on simple RSX-conversion prompts when the
<think>block runs long (observed once on Q8 of the spec exam)
4B vs 8B
| Metric | 4B | 8B |
|---|---|---|
| Base model parameters | 4.0B (3.6B non-embed) | 8.2B (6.95B non-embed) |
| Q4_K_M file size | ~2.33 GB | ~4.68 GB |
| Exam score | 102 / 103 | 103 / 103 |
| Weighted score | 143.5 / 144.5 (99.31 %) | 144.5 / 144.5 (100 %) |
| Exam wall time | 6.9 min | 10.3 min |
| Generation throughput | ~184 t/s | ~132 t/s |
The 4B is the recommended pick if disk / RAM are tight; the 8B is the safer default if you need every last fundamentals point.
Transparency
Per-question model outputs and the patched grader source are published alongside the weights:
- Weights: HuggingFace โ rockypod/neotoi-coder-4b
- Family hub (8B / 4B / 14B comparison): rockypod/neotoi-coder
- Exam runner, grader, per-question results: GitHub โ rockypod/neotoi-coder
- Ollama:
ollama pull rockypod/neotoi-coder:4b
The training dataset itself is not redistributed โ see the GitHub repo for the data-generation pipeline.
License & Attribution
Fine-tuned weights and dataset: licensed under the Neotoi Coder Community License v1.0 โ see LICENSE. Commercial use of model outputs permitted. Weight redistribution prohibited. Mental health deployment requires written permission.
Upstream models: the base model and teacher model are licensed under the Apache License, Version 2.0 โ see LICENSE-APACHE and NOTICE:
- Base: Qwen3-4B โ ยฉ Alibaba Cloud
- Teacher: Qwen3-Coder-Next 80B โ ยฉ Alibaba Cloud
The Neotoi Coder 4B weights are a derivative work of Qwen3-4B, fine-tuned via LoRA adapters on the Neotoi Coder RAFT dataset and then merged + quantized to GGUF.
Credits
- Unsloth โ 2ร faster fine-tuning
- TRL โ SFTTrainer
- Qwen3-4B โ base model
- Dioxus โ the framework this model specializes in
- Claude Code โ dataset pipeline and training infrastructure
- Downloads last month
- 53
4-bit