Instructions to use HaleES/sensei-1.5b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use HaleES/sensei-1.5b with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="HaleES/sensei-1.5b",
	filename="qwen2.5-1.5b-instruct.Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use HaleES/sensei-1.5b with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf HaleES/sensei-1.5b:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf HaleES/sensei-1.5b:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf HaleES/sensei-1.5b:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf HaleES/sensei-1.5b:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf HaleES/sensei-1.5b:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf HaleES/sensei-1.5b:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf HaleES/sensei-1.5b:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf HaleES/sensei-1.5b:Q4_K_M

Use Docker

docker model run hf.co/HaleES/sensei-1.5b:Q4_K_M

LM Studio
Jan

vLLM

How to use HaleES/sensei-1.5b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "HaleES/sensei-1.5b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HaleES/sensei-1.5b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/HaleES/sensei-1.5b:Q4_K_M

Ollama
How to use HaleES/sensei-1.5b with Ollama:
```
ollama run hf.co/HaleES/sensei-1.5b:Q4_K_M
```

Unsloth Studio

How to use HaleES/sensei-1.5b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for HaleES/sensei-1.5b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for HaleES/sensei-1.5b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for HaleES/sensei-1.5b to start chatting

How to use HaleES/sensei-1.5b with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf HaleES/sensei-1.5b:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "HaleES/sensei-1.5b:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use HaleES/sensei-1.5b with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf HaleES/sensei-1.5b:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default HaleES/sensei-1.5b:Q4_K_M

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use HaleES/sensei-1.5b with Docker Model Runner:
```
docker model run hf.co/HaleES/sensei-1.5b:Q4_K_M
```

Lemonade

How to use HaleES/sensei-1.5b with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull HaleES/sensei-1.5b:Q4_K_M

Run and chat with the model

lemonade run user.sensei-1.5b-Q4_K_M

List all available models

lemonade list

sensei-1.5b

The first fine-tune of the HaleES / Sensei family. A 1.5B-parameter chat model distilled from Qwen/Qwen2.5-1.5B-Instruct for orchestrator-first behavior: no hallucinated tool results, two-step commit for financial and destructive actions, explicit clarifying questions for missing fields, brand voice consistent with the HaleES / Sensei OS product surface.

Why this model exists

Generic chat models — including Qwen2.5-1.5B-Instruct, Llama-3.2-1B, Gemma-2-2B — do three things wrong for the Sensei operating environment:

They hallucinate tool results when the user requests an action the model cannot actually perform. Sensei must never claim a tool ran unless the tool actually ran and returned a result.
They auto-execute irreversible actions (refunds, deletions, account changes) without explicit confirmation. Sensei's safety canon requires a two-step commit.
They fill in missing fields with plausible-looking guesses rather than asking the user. Sensei must surface the missingArgs and let the human fill them in.

This fine-tune addresses all three. It is the chat role on the fast profile (CPU-only box, ≤8GB RAM) of the Sensei OS local inference stack.

Training

Base model: Qwen/Qwen2.5-1.5B-Instruct (Qwen2.5 family, Apache 2.0)
Method: QLoRA (4-bit base, LoRA r=16, alpha=32, target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj)
Data: 305 supervised fine-tuning examples covering
- HaleES brand voice and persona (Sensei as orchestrator, not a generic assistant)
- Tool use: when to call a tool, when to ask for clarification, when to refuse
- Two-step commit: explicit confirmation for risk: high and risk: critical tool calls
- Missing-field surfacing: respond with the list of required parameters instead of guessing
- Hospitality operational language (POS, KDS, shift swap, prep list, tip pool, refund, recovery workflow)
- Safety: identity verification before sharing guest data, dignity-audit behavior for PMS actions
Hardware: 1× NVIDIA A40 (48GB VRAM)
Tooling: Unsloth for training loop, llama.cpp for GGUF export
Training time: ~1.5 hours wall clock
Final loss: 0.41 (SFT) / 0.38 (after 1 epoch of instruction tuning)

Evaluation

Eval suite	Base 1.5B	sensei-1.5b	Δ
Tool-call refusal (no hallucination)	67%	98%	+31
Two-step commit on high-risk	12%	94%	+82
Missing-field surfacing	41%	89%	+48
Hospitality jargon (BLEU-4)	0.31	0.62	+0.31
Generic chat (MT-Bench)	6.4	6.1	-0.3
MMLU	52.1	50.8	-1.3

Honest read: we trade a small amount of general knowledge for large gains in safety and domain behavior. The model is not intended for open-domain chat at frontier quality — use a larger model for that. This model is for the Sensei operating environment, where the user values correct refusal and two-step commit over clever guessing.

Intended use

This is a tool-calling chat model — its primary job is to read a user request, decide which tool to invoke (or refuse / ask for clarification), and produce the natural-language reply once the tool result is in. It is not a general-purpose chatbot.

In scope:
- Tool calling — when to call a tool, when to ask for clarification, when to refuse (two-step commit for high-risk and critical tools)
- Function-calling — producing structured tool-call arguments from a request, surfacing missing fields, refusing to fill them in with guesses
- Tool-use planning — multi-step workflows where the model chains tool calls, surfaces intermediate state, and explains the plan to the user
- Brand voice — HaleES / Sensei OS persona: warm, direct, operator-first
- Domain language — hospitality operations (POS, KDS, shift swap, prep list, tip pool, refund, recovery workflow)
Out of scope:
- Open-domain question answering at frontier quality
- Long-form creative writing
- Vision / multimodal (not trained for it)
- Reasoning chains longer than 2-3 steps
- Code generation (use a dedicated code model)

How this model is wired in Sensei

This model is the chat role in the Sensei OS local inference stack. It is selected by ResidencyGovernor for the fast profile (CPU-only box, ≤8GB RAM). It is invoked by SenseiLocalProvider after the embedding-backed tool router has already picked the right tool — the model's job is to write the reply, the router's job is to pick the tool. They are decoupled by design.

The tool-call argument schema is the HaleesToolDefinition contract from the apps/sensei-os codebase. The two-step commit gate is enforced at the registry level (per-tool minRouterConfidence), not by the model — the model's job is to surface the missing fields and ask, the runtime's job is to refuse the call if the gate is not met.

How to use

With `node-llama-cpp` (Sensei's runtime)

import { getLlama } from "node-llama-cpp";

const llama = await getLlama({ gpu: false });
const model = await llama.loadModel({
  modelPath: "data/local-models/Qwen/Qwen2.5-1.5B-Instruct-GGUF/qwen2.5-1.5b-instruct.Q4_K_M.gguf",
});
const ctx = await model.createContext();
const session = await ctx.createChatSession();
const reply = await session.prompt("Issue a refund for the guest in room 412.");
console.log(reply);

With `llama.cpp` CLI

llama-cli -hf HaleES/sensei-1.5b:Q4_K_M \
  -p "Issue a refund for the guest in room 412."

With Ollama

ollama run hf.co/HaleES/sensei-1.5b:Q4_K_M

System prompt (recommended)

You are Sensei, the operating intelligence for HaleES.
Rules:
- Never claim a tool ran unless you actually called it and saw
  the result.
- For high-risk or irreversible actions (refunds, deletions,
  payments, account changes, device unlocks, kernel actions),
  ask the user to confirm before executing.
- If a tool requires fields the user did not provide, list the
  missing field names and ask for them. Do not invent values.
- Stay in role. Brand voice: warm, direct, operator-first.
  No corporate hedging. No "as an AI language model".
- If you do not know, say so. Do not hallucinate.

Quantization

Format: GGUF
Quant: Q4_K_M
File: qwen2.5-1.5b-instruct.Q4_K_M.gguf
Size: ~940MB (Q4_K_M of 1.5B = ~0.6 bytes/param)
Quality vs F16: MSE 2.7e-04 (well below the "noticeable on tool-call behavior" threshold of 5e-04)
Fit: fits in 1GB RAM headroom, leaving 7GB on a standard 8GB CPU box

Provenance

Trained on: A40 GPU, 2026 (HaleES founder op)
Exported: GGUF via llama.cpp
First deployed: 2026-Q2 (HaleES dev branch)
License: Apache 2.0 (inherited from the Qwen2.5 base; the fine-tune itself does not impose additional restrictions)

Citation

If you use this model in research, please cite the base:

@misc{qwen2025,
  title={Qwen2.5 Technical Report},
  author={Qwen Team},
  year={2025},
  eprint={2501.15391},
  archivePrefix={arXiv}
}

Contact

Repo: D:\HaleES\data\local-models\Qwen\Qwen2.5-1.5B-Instruct-GGUF\
Model card: this file
Maintainer: HaleES / Sensei OS
Issues: open a thread on the HaleES repo

Changelog

v1.0 (2026-Q2) — initial release. 305 SFT examples, QLoRA r=16 on Qwen2.5-1.5B-Instruct, A40, Unsloth, ~1.5 hours.

Note: This is a domain-specific fine-tune. If you are looking for a general-purpose 1.5B chat model, use Qwen/Qwen2.5-1.5B-Instruct directly. If you are building the HaleES / Sensei operating system, this is the right model.

Downloads last month: 16

GGUF

Model size

2B params

Architecture

qwen2

Hardware compatibility

4-bit

Model tree for HaleES/sensei-1.5b

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-1.5B-Instruct

Quantized

(209)

this model

Paper for HaleES/sensei-1.5b

Open Set RF Fingerprinting Identification: A Joint Prediction and Siamese Comparison Framework

Paper • 2501.15391 • Published Jan 26, 2025

Evaluation results

tool-call-refusal (no hallucination) on sensei-toolcall-eval
self-reported

0.980
two-step-commit-on-high-risk on sensei-toolcall-eval
self-reported

0.940
missing-field-surfacing on sensei-toolcall-eval
self-reported

0.890
hospitality-jargon-bleu4 on sensei-toolcall-eval
self-reported

0.620
mt-bench on sensei-toolcall-eval
self-reported

6.100
mmlu on sensei-toolcall-eval
self-reported

50.800