Instructions to use rockypod/neotoi-coder-4b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use rockypod/neotoi-coder-4b with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="rockypod/neotoi-coder-4b",
	filename="neotoi-coder-v3.1-4b-q4_k_m.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use rockypod/neotoi-coder-4b with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf rockypod/neotoi-coder-4b:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf rockypod/neotoi-coder-4b:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf rockypod/neotoi-coder-4b:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf rockypod/neotoi-coder-4b:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf rockypod/neotoi-coder-4b:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf rockypod/neotoi-coder-4b:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf rockypod/neotoi-coder-4b:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf rockypod/neotoi-coder-4b:Q4_K_M

Use Docker

docker model run hf.co/rockypod/neotoi-coder-4b:Q4_K_M

LM Studio
Jan

vLLM

How to use rockypod/neotoi-coder-4b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "rockypod/neotoi-coder-4b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rockypod/neotoi-coder-4b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/rockypod/neotoi-coder-4b:Q4_K_M

Ollama
How to use rockypod/neotoi-coder-4b with Ollama:
```
ollama run hf.co/rockypod/neotoi-coder-4b:Q4_K_M
```

Unsloth Studio new

How to use rockypod/neotoi-coder-4b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for rockypod/neotoi-coder-4b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for rockypod/neotoi-coder-4b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for rockypod/neotoi-coder-4b to start chatting

Pi new

How to use rockypod/neotoi-coder-4b with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf rockypod/neotoi-coder-4b:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "neotoi-coder-4b"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Docker Model Runner
How to use rockypod/neotoi-coder-4b with Docker Model Runner:
```
docker model run hf.co/rockypod/neotoi-coder-4b:Q4_K_M
```

Lemonade

How to use rockypod/neotoi-coder-4b with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull rockypod/neotoi-coder-4b:Q4_K_M

Run and chat with the model

lemonade run user.neotoi-coder-4b-Q4_K_M

List all available models

lemonade list

NeoToi Coder v3.1 — 4B

A Rust / Dioxus 0.7 specialist fine-tuned from Qwen3-4B (4.0B parameters, 3.6B non-embedding, tied input/output embeddings) using RAFT (Retrieval-Augmented Fine-Tuning). Optimized for production-quality Dioxus 0.7 components with Tailwind v4 and WCAG 2.2 AAA accessibility.

This is the 4B variant of the v3.1 release — about half the size, ~40% faster generation than the 8B variant (rockypod/neotoi-coder-8b), graded marginally lower on the spec exam. The legacy 14B is at rockypod/neotoi-coder and the family hub linking all three is on the same page.

Exam Results — 104-question Dioxus 0.7 Spec Exam

Re-graded 2026-04-26 with the patched grader (run_grade_v31.py, accepts LANG()/THEME() GlobalSignal accessor calls on Q87).

Tier	Name	Cnt	Raw	Wtd	/Max	Rate	Floor	Status
T1	Fundamentals	12	11	11.0	12.0	91.7%	82%	✅
T2	RSX Syntax	12	12	12.0	12.0	100.0%	82%	✅
T3	Signal Hygiene	12	12	12.0	12.0	100.0%	82%	✅
T4	WCAG / ARIA	14	14	21.0	21.0	100.0%	82%	✅
T5	use_resource	8	8	12.0	12.0	100.0%	82%	✅
T6	Hard Reasoning	10	10	20.0	20.0	100.0%	88%	✅
T7	Primitives + CSS	12	12	18.0	18.0	100.0%	82%	✅
T8	GlobalSignal / i18n	8	8	12.0	12.0	100.0%	82%	✅
T9	Static Navigator	6	6	9.0	9.0	100.0%	82%	✅
T10	Dioxus 0.7.4	6	6	12.0	12.0	100.0%	88%	✅
T11	Server Functions	3	3	4.5	4.5	100.0%	82%	✅
Overall		102	143.5	/144.5		99.31%	—	✅ PASS

Publication bar (90%): PASS
Release bar (95%): PASS
Tier floors: PASS

Single failure: Q8 (T1 RSX conversion) — generation truncated mid-<think> block, so no answer body was produced. Real model failure, not a grader artifact.

Version History

Version	Base (params)	Score	Exam	Dataset	Status
v1.0	Qwen3-Coder-14B (14.8B)	51/60 (85.0%)	60Q standard	—	Published
v2.0	Qwen3-Coder-14B (14.8B)	135.5/140 (96.8%)	100Q weighted	4,185	Published
v3.0	Qwen3-Coder-14B (14.8B)	124.0/144.5 (85.8%)	103Q weighted	4,535	Published
v3.1	Qwen3-Coder-14B (14.8B)	137.0/144.5 (94.81%)	103Q weighted	4,880	Published
v3.1	Qwen3-8B (8.2B)	144.5/144.5 (100.00%)	103Q weighted	4,880	Published
v3.1	Qwen3-4B (4.0B)	143.5/144.5 (99.31%)	103Q weighted	4,880	This release

Model Details

Base model: Qwen/Qwen3-4B (4.0B parameters total, 3.6B non-embedding, tied input/output embeddings)
Method: RAFT (Retrieval-Augmented Fine-Tuning) with LoRA adapters
Dataset: 4,880 curated Dioxus 0.7 examples across 43 topics
Scope: Rust + Dioxus 0.7 + Tailwind v4 + WCAG 2.2 AAA
Quantization: Q4_K_M (~2.33 GB)
Thinking tokens: patched (qwen3.thinking = true)
Author: Kevin Miller, Jr.

Training

Field	Value
Steps	2,440
Epochs	4
Wall time	~1h 49m
Final train loss	0.4724
LoRA rank	16 (alpha 32, dropout 0)
Target modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Sequence length	2048
Precision	bf16 + 4-bit base
Hardware	RTX 3090 Ti (24 GB)

Files

neotoi-coder-v3.1-4b-q4_k_m.gguf — Q4_K_M quant (~2.33 GB)
neotoi-coder-v3.1-4b-q4_k_m_patched.gguf — same quant + qwen3.thinking=true patch (recommended for Ollama / LM Studio)

Enabling Thinking Mode

This model emits Qwen3 native <think>...</think> blocks. Thinking is on by default with the patched GGUF on inference backends that honor qwen3.thinking.

Ollama

FROM neotoi-coder-v3.1-4b-q4_k_m_patched.gguf

PARAMETER temperature 0.2
PARAMETER num_predict 2000
PARAMETER num_ctx 8192
PARAMETER repeat_penalty 1.1
PARAMETER stop "<|im_end|>"

TEMPLATE """<|im_start|>system
{{ .System }}<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
<think>
"""

SYSTEM You are NeoToi, an expert Rust and Dioxus 0.7 developer specialized in Tailwind v4 and WCAG 2.2 AAA accessibility. Always think step-by-step before answering.

ollama create neotoi-coder:4b -f Modelfile
ollama run neotoi-coder:4b

LM Studio

Field	Value
Before System	`<\|im_start\|>system`
After System	`<\|im_end\|>`
Before User	`<\|im_start\|>user`
After User	`<\|im_end\|>`
Before Assistant	`<\|im_start\|>assistant\n<think>`
After Assistant	`<\|im_end\|>`

llama.cpp

./llama-cli \
  -m neotoi-coder-v3.1-4b-q4_k_m_patched.gguf \
  -ngl 99 \
  --temp 0.2 \
  -p "<|im_start|>user\nYour question here<|im_end|>\n<|im_start|>assistant\n<think>"

What It Knows

Dioxus 0.7 RSX brace syntax — never function-call style
use_signal, use_resource with correct three-arm match
r#for on label elements only, never inputs
WCAG 2.2 AAA: aria_labelledby, aria_describedby, role="alert", role="dialog", live regions
dioxus-primitives — no manual ARIA on managed components
styles!() macro for CSS modules
Tailwind v4 utility classes
GlobalSignal patterns (LANG / THEME), i18n, dark-mode toggling
Dioxus 0.7.4 APIs: WritableResultExt, WebSocket Stream+Sink, server-fn extractors

What It Does Not Know

Playwright / E2E testing (out of scope)
Non-Dioxus web frameworks
Backends or databases beyond what server functions cover
Occasional generation truncation on simple RSX-conversion prompts when the <think> block runs long (observed once on Q8 of the spec exam)

4B vs 8B

Metric	4B	8B
Base model parameters	4.0B (3.6B non-embed)	8.2B (6.95B non-embed)
Q4_K_M file size	~2.33 GB	~4.68 GB
Exam score	102 / 103	103 / 103
Weighted score	143.5 / 144.5 (99.31 %)	144.5 / 144.5 (100 %)
Exam wall time	6.9 min	10.3 min
Generation throughput	~184 t/s	~132 t/s

The 4B is the recommended pick if disk / RAM are tight; the 8B is the safer default if you need every last fundamentals point.

Transparency

Per-question model outputs and the patched grader source are published alongside the weights:

Weights: HuggingFace — rockypod/neotoi-coder-4b
Family hub (8B / 4B / 14B comparison): rockypod/neotoi-coder
Exam runner, grader, per-question results: GitHub — rockypod/neotoi-coder
Ollama: ollama pull rockypod/neotoi-coder:4b

The training dataset itself is not redistributed — see the GitHub repo for the data-generation pipeline.

License & Attribution

Fine-tuned weights and dataset: licensed under the Neotoi Coder Community License v1.0 — see LICENSE. Commercial use of model outputs permitted. Weight redistribution prohibited. Mental health deployment requires written permission.

Upstream models: the base model and teacher model are licensed under the Apache License, Version 2.0 — see LICENSE-APACHE and NOTICE:

The Neotoi Coder 4B weights are a derivative work of Qwen3-4B, fine-tuned via LoRA adapters on the Neotoi Coder RAFT dataset and then merged + quantized to GGUF.

Credits

Unsloth — 2× faster fine-tuning
TRL — SFTTrainer
Qwen3-4B — base model
Dioxus — the framework this model specializes in
Claude Code — dataset pipeline and training infrastructure

Downloads last month: 53

GGUF

Model size

4B params

Architecture

qwen3

Hardware compatibility

4-bit

Model tree for rockypod/neotoi-coder-4b

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-4B

Finetuned

(621)

this model