Instructions to use Acnoryx/Airy with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Acnoryx/Airy with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Acnoryx/Airy",
	filename="gguf/Airy-0.8b-IQ1_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use Acnoryx/Airy with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Acnoryx/Airy:IQ1_M
# Run inference directly in the terminal:
llama-cli -hf Acnoryx/Airy:IQ1_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Acnoryx/Airy:IQ1_M
# Run inference directly in the terminal:
llama-cli -hf Acnoryx/Airy:IQ1_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Acnoryx/Airy:IQ1_M
# Run inference directly in the terminal:
./llama-cli -hf Acnoryx/Airy:IQ1_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Acnoryx/Airy:IQ1_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Acnoryx/Airy:IQ1_M

Use Docker

docker model run hf.co/Acnoryx/Airy:IQ1_M

LM Studio
Jan

vLLM

How to use Acnoryx/Airy with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Acnoryx/Airy"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Acnoryx/Airy",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Acnoryx/Airy:IQ1_M

Ollama
How to use Acnoryx/Airy with Ollama:
```
ollama run hf.co/Acnoryx/Airy:IQ1_M
```

Unsloth Studio new

How to use Acnoryx/Airy with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Acnoryx/Airy to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Acnoryx/Airy to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Acnoryx/Airy to start chatting

Docker Model Runner
How to use Acnoryx/Airy with Docker Model Runner:
```
docker model run hf.co/Acnoryx/Airy:IQ1_M
```

Lemonade

How to use Acnoryx/Airy with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Acnoryx/Airy:IQ1_M

Run and chat with the model

lemonade run user.Airy-IQ1_M

List all available models

lemonade list

Acnoryx AI Research Bundle

Overview

Base model: Qwen/Qwen3.5-0.8B
Model size: 0.8b
Research quantizations: Q3_K_M, IQ3_M, Q2_K, IQ2_M, IQ2_XS, IQ2_XXS, IQ1_M, IQ1_S
Purpose: evaluate quality vs. size trade-offs below the production threshold

Notes

IQ1/IQ2 formats require an importance matrix (imatrix).
These files are more experimental than the release bundle.
Production-facing use should prefer the release bundle.
If prompting in Vietnamese, write with full accents for best consistency.

Evaluation Snapshot

Research GGUFs were continued from the existing results and merged with the latest rerun on the same curated 58-question bilingual benchmark.

Quant	Think	No-Think	Avg	Status
Q3_K_M	74.1%	72.4%	73.2%	Best current research quant
IQ3_M	60.3%	60.3%	60.3%	Heavy quality loss
IQ2_M	20.7%	19.0%	19.8%	Below usable threshold
IQ2_XS	5.2%	3.4%	4.3%	Triggered early-stop for lower bits

Research Guidance

Public research recommendation: Q3_K_M only
IQ3_M is still uploadable for experiments, but quality is clearly degraded
The rerun auto-stopped below IQ2_XS because average pass rate fell under 50%, so lower-bit quants should be considered archival artifacts rather than viable deployments
For any user-facing scenario, prefer the release bundle instead of this research branch

For cross-family ranking and release-vs-research comparison, see results/COMPARISON.md in the workspace.

Downloads last month: 76

GGUF

Model size

0.8B params

Architecture

qwen35

Hardware compatibility

1-bit

2-bit

3-bit

Model tree for Acnoryx/Airy

Base model

Qwen/Qwen3.5-0.8B-Base

Finetuned

Qwen/Qwen3.5-0.8B

Quantized

(109)

this model