Instructions to use evalengine/unbound-q-0.8b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use evalengine/unbound-q-0.8b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="evalengine/unbound-q-0.8b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("evalengine/unbound-q-0.8b") model = AutoModelForCausalLM.from_pretrained("evalengine/unbound-q-0.8b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use evalengine/unbound-q-0.8b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "evalengine/unbound-q-0.8b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "evalengine/unbound-q-0.8b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/evalengine/unbound-q-0.8b
- SGLang
How to use evalengine/unbound-q-0.8b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "evalengine/unbound-q-0.8b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "evalengine/unbound-q-0.8b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "evalengine/unbound-q-0.8b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "evalengine/unbound-q-0.8b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use evalengine/unbound-q-0.8b with Docker Model Runner:
docker model run hf.co/evalengine/unbound-q-0.8b
Unbound Q-0.8B — because there is no boundary
No guarantee — use at your own risk. This model has reduced safety filtering and can produce harmful, false, biased, or unsafe output. Provided as-is; you are responsible for compliance with applicable laws.
Uncensored on-device finetune of unsloth/Qwen3.5-0.8B by the Chromia &
Eval Engine team. ~0.8 billion effective parameters, text-only, ~530 MB at
Q4_K_M — the smallest member of the Unbound family.
This repo holds the merged HF weights. On-device GGUF builds (Ollama,
llama.cpp, LM Studio) are at
evalengine/unbound-q-0.8b-GGUF.
Benchmarks (vs base Qwen3.5-0.8B)
| Axis | Base | Unbound Q-0.8B | Δ |
|---|---|---|---|
| Refusal rate (AdvBench 520, LLM judge) | 90.58% | 5.00% | −85.58 pts |
| Useful-compliance rate | n/a | 6.35% | — |
| Hallucination (on harmful prompts) | n/a | 35.77% | — |
| Coherence (benign prompts) | 1.00 | 1.00 | 0 |
TruthfulQA mc2 (--limit 100) |
0.430 | 0.427 | −0.3 pt |
MMLU (--limit 100, 61 subtasks avg) |
0.493 | 0.505 | +1.2 pt |
GSM8K (--limit 100) |
0.41 | 0.42 | +1.0 pt |
GPQA-Diamond (--limit 200) |
23.23% | 26.26% | +3.0 pt (within stderr) |
BBH macro (24 tasks, --limit 200) |
38.21% | 41.29% | +3.1 pt (outside stderr) |
| KL divergence vs base | 0 | 0.605 | (SFT-expected) |
Refusal collapses from 91% → 5% and every capability axis lands flat or up vs base — Q-track is the only Unbound model where SFT improved the release-suite scores (BBH macro +3 pp, well outside the 0.68 pp stderr). Hallucination on harmful prompts is the dominant gap vs the larger Unbound siblings: this 0.8B class doesn't reliably produce factual content on the adversarial set (Q-18's decontamination experiment confirmed the plateau is structural to model size, not a teacher-data artifact).
Sampling
Qwen3.5 non-thinking preset:
temperature=0.7, top_p=0.8, top_k=20, min_p=0.0, presence_penalty=1.5- For factual / brand questions, drop
temperatureto ~0.3–0.5. - llama.cpp: pass
--jinja.
Use
# on-device (GGUF)
ollama pull hf.co/evalengine/unbound-q-0.8b-GGUF
ollama run hf.co/evalengine/unbound-q-0.8b-GGUF
# transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("evalengine/unbound-q-0.8b")
tok = AutoTokenizer.from_pretrained("evalengine/unbound-q-0.8b")
Acknowledgements
Fine-tuned with Unsloth + HF TRL. Abliteration via heretic. Compliance training data distilled from AEON and audited row-by-row; 48 major-fabrication rows decontaminated before training.
Links
- Larger Unbound siblings: E2B · E4B
- Unbound — unbound.evalengine.ai
- Eval Engine — evalengine.ai · X / Twitter
- Token — CoinGecko · CoinMarketCap
License
Apache-2.0, inherited from Qwen/Qwen3.5-0.8B.
- Downloads last month
- 15