Instructions to use evalengine/unbound-q-0.8b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use evalengine/unbound-q-0.8b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="evalengine/unbound-q-0.8b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("evalengine/unbound-q-0.8b")
model = AutoModelForCausalLM.from_pretrained("evalengine/unbound-q-0.8b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use evalengine/unbound-q-0.8b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "evalengine/unbound-q-0.8b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "evalengine/unbound-q-0.8b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/evalengine/unbound-q-0.8b

SGLang

How to use evalengine/unbound-q-0.8b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "evalengine/unbound-q-0.8b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "evalengine/unbound-q-0.8b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "evalengine/unbound-q-0.8b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "evalengine/unbound-q-0.8b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use evalengine/unbound-q-0.8b with Docker Model Runner:
```
docker model run hf.co/evalengine/unbound-q-0.8b
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Unbound Q-0.8B — because there is no boundary

No guarantee — use at your own risk. This model has reduced safety filtering and can produce harmful, false, biased, or unsafe output. Provided as-is; you are responsible for compliance with applicable laws.

Uncensored on-device finetune of unsloth/Qwen3.5-0.8B by the Chromia & Eval Engine team. ~0.8 billion effective parameters, text-only, ~530 MB at Q4_K_M — the smallest member of the Unbound family.

This repo holds the merged HF weights. On-device GGUF builds (Ollama, llama.cpp, LM Studio) are at evalengine/unbound-q-0.8b-GGUF.

Benchmarks (vs base `Qwen3.5-0.8B`)

Axis	Base	Unbound Q-0.8B	Δ
Refusal rate (AdvBench 520, LLM judge)	90.58%	5.00%	−85.58 pts
Useful-compliance rate	n/a	6.35%	—
Hallucination (on harmful prompts)	n/a	35.77%	—
Coherence (benign prompts)	1.00	1.00	0
TruthfulQA mc2 (`--limit 100`)	0.430	0.427	−0.3 pt
MMLU (`--limit 100`, 61 subtasks avg)	0.493	0.505	+1.2 pt
GSM8K (`--limit 100`)	0.41	0.42	+1.0 pt
GPQA-Diamond (`--limit 200`)	23.23%	26.26%	+3.0 pt (within stderr)
BBH macro (24 tasks, `--limit 200`)	38.21%	41.29%	+3.1 pt (outside stderr)
KL divergence vs base	0	0.605	(SFT-expected)

Refusal collapses from 91% → 5% and every capability axis lands flat or up vs base — Q-track is the only Unbound model where SFT improved the release-suite scores (BBH macro +3 pp, well outside the 0.68 pp stderr). Hallucination on harmful prompts is the dominant gap vs the larger Unbound siblings: this 0.8B class doesn't reliably produce factual content on the adversarial set (Q-18's decontamination experiment confirmed the plateau is structural to model size, not a teacher-data artifact).

Sampling

Qwen3.5 non-thinking preset:

temperature=0.7, top_p=0.8, top_k=20, min_p=0.0, presence_penalty=1.5
For factual / brand questions, drop temperature to ~0.3–0.5.
llama.cpp: pass --jinja.

Use

# on-device (GGUF)
ollama pull hf.co/evalengine/unbound-q-0.8b-GGUF
ollama run  hf.co/evalengine/unbound-q-0.8b-GGUF

# transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("evalengine/unbound-q-0.8b")
tok   = AutoTokenizer.from_pretrained("evalengine/unbound-q-0.8b")

Acknowledgements

Fine-tuned with Unsloth + HF TRL. Abliteration via heretic. Compliance training data distilled from AEON and audited row-by-row; 48 major-fabrication rows decontaminated before training.

License

Apache-2.0, inherited from Qwen/Qwen3.5-0.8B.

Downloads last month: 15

Safetensors

Model size

0.8B params

Tensor type

BF16

Model tree for evalengine/unbound-q-0.8b

Base model

Qwen/Qwen3.5-0.8B-Base

Finetuned

Qwen/Qwen3.5-0.8B

Finetuned

unsloth/Qwen3.5-0.8B

Finetuned

(104)

this model

Quantizations

1 model

Collection including evalengine/unbound-q-0.8b

Unbound-V1

Collection

The first unbound model built by Eval Engine & Chromia team • 7 items • Updated about 15 hours ago • 3

evalengine
/

unbound-q-0.8b

Unbound Q-0.8B — because there is no boundary

Benchmarks (vs base `Qwen3.5-0.8B`)

Sampling

Use

Acknowledgements

Links

License

Model tree for evalengine/unbound-q-0.8b

Collection including evalengine/unbound-q-0.8b

Unbound-V1

Unbound Q-0.8B — because there is no boundary

Benchmarks (vs base Qwen3.5-0.8B)

Sampling

Use

Acknowledgements

Links

License

Model tree for evalengine/unbound-q-0.8b

Collection including evalengine/unbound-q-0.8b

Benchmarks (vs base `Qwen3.5-0.8B`)