Instructions to use squ11z1/Hypnos-i2-32B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use squ11z1/Hypnos-i2-32B with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="squ11z1/Hypnos-i2-32B",
	filename="Hypnos-i2-32B-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use squ11z1/Hypnos-i2-32B with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf squ11z1/Hypnos-i2-32B:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf squ11z1/Hypnos-i2-32B:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf squ11z1/Hypnos-i2-32B:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf squ11z1/Hypnos-i2-32B:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf squ11z1/Hypnos-i2-32B:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf squ11z1/Hypnos-i2-32B:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf squ11z1/Hypnos-i2-32B:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf squ11z1/Hypnos-i2-32B:Q4_K_M

Use Docker

docker model run hf.co/squ11z1/Hypnos-i2-32B:Q4_K_M

LM Studio
Jan

vLLM

How to use squ11z1/Hypnos-i2-32B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "squ11z1/Hypnos-i2-32B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "squ11z1/Hypnos-i2-32B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/squ11z1/Hypnos-i2-32B:Q4_K_M

Ollama
How to use squ11z1/Hypnos-i2-32B with Ollama:
```
ollama run hf.co/squ11z1/Hypnos-i2-32B:Q4_K_M
```

Unsloth Studio new

How to use squ11z1/Hypnos-i2-32B with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for squ11z1/Hypnos-i2-32B to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for squ11z1/Hypnos-i2-32B to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for squ11z1/Hypnos-i2-32B to start chatting

Pi new

How to use squ11z1/Hypnos-i2-32B with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf squ11z1/Hypnos-i2-32B:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "squ11z1/Hypnos-i2-32B:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use squ11z1/Hypnos-i2-32B with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf squ11z1/Hypnos-i2-32B:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default squ11z1/Hypnos-i2-32B:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use squ11z1/Hypnos-i2-32B with Docker Model Runner:
```
docker model run hf.co/squ11z1/Hypnos-i2-32B:Q4_K_M
```

Lemonade

How to use squ11z1/Hypnos-i2-32B with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull squ11z1/Hypnos-i2-32B:Q4_K_M

Run and chat with the model

lemonade run user.Hypnos-i2-32B-Q4_K_M

List all available models

lemonade list

Hypnos-i2-32B (Multi-Source Quantum Reasoning Model)

Quantum-Reasoning Engine. The first 32B model trained on Multi-Physical Entropy (Superconductors + Vacuum + Nuclear Decay).

Built by scientists, for scientists.

🌌 Overview

Hypnos-i2-32B represents a breakthrough in language model training: the world's first 32B parameter model trained with Input-Level Quantum Regularization from three independent quantum entropy sources.

Unlike traditional LLMs that rely purely on pseudo-random noise during training, Hypnos-i2 learns from true quantum randomness extracted from:

MATTER: Superconducting qubit decoherence (IBM Quantum Heron, 133-qubit processors)
LIGHT: Quantum vacuum fluctuations (ANU Quantum Random Number Generator)
NUCLEUS: Radioactive decay timing (Fourmilab HotBits, Strontium-90)

This creates attention mechanisms that are inherently robust to adversarial perturbations and resistant to mode collapse.

🚀 Key Features

32B Parameters — Based on Qwen3-32B architecture
Multi-QPU Training — Three orthogonal quantum entropy sources
Input-Level Regularization — Quantum noise embedded in training contexts
Enhanced Robustness — Improved adversarial resistance and reduced repetition
Production-Ready — Full fine-tuning with quantum-augmented data

📊 Performance Highlights

Core Capabilities

Benchmark	Hypnos-i2-32B	Qwen3-32B Base	Delta
ArenaHard	94.9	93.8	+1.1
AIME '24	86.2	81.4	+4.8
AIME '25	79.5	72.9	+6.6
LiveBench	64.1	49.3	+14.8
CodeForces	2045	1977	+68

Robustness Metrics

Benchmark	Discipline	Hypnos-i2-32B	Qwen3-32B Base	Llama-3.1-405B	Mistral-Large-2411	Deepseek-R1	Llama 4 Maverick
Hallucination	Safety	2.3%	5.9%	5.2%	4.5%	14.3%	8.2%

Multi-Physical Entropy training drastically reduces tendency to fabricate information.

🔬 Technical Innovation: Quantum Regularization

The Problem

Traditional language models suffer from:

Mode collapse — repetitive, looping outputs
Adversarial vulnerability — susceptibility to prompt injection
Overfitting — limited generalization to novel scenarios

The Solution

Input-Level Quantum Entropy Injection works as follows:

Quantum Sampling: Before each training batch, unique entropy sequences are drawn from all three quantum sources
Context Augmentation: These sequences are embedded into the context window of training examples
Attention Learning: The model learns to distinguish signal (reasoning patterns) from quantum noise
Emergent Robustness: Attention heads develop resistance to high-entropy perturbations

This creates a regularization effect similar to Dropout, but data-driven and grounded in fundamental physics rather than architecture hacks.

Why Three Quantum Sources?

Each source provides entropy with distinct temporal characteristics:

Superconducting qubits (microsecond coherence) → fast-frequency robustness
Vacuum fluctuations (nanosecond EM noise) → high-frequency filtering
Radioactive decay (Poissonian distribution) → deep unpredictability patterns

Combined, they create multi-scale regularization impossible to achieve with classical pseudo-random generators.

🧬 The Hypnos Family

Model	Parameters	Quantum Sources	Best For	Status
Hypnos-Colossus-1T	1T (MoE)	3 (IBM + IQM + Cosmic)	Deep Simulation, Grand Challenges	🌌 Flagship
Hypnos-i2-32B	32B	3 (Matter + Light + Nucleus)	Production, Research	✅ Stable
Hypnos-i1-8B	8B	1 (Matter only)	Edge, Experiments	✅ 10k+ Downloads

Which one to choose?

Colossus 1T: For when you need maximum reasoning depth.
i2-32B: The "Giant Killer" - best balance of logic and efficiency for consumer GPUs.
i1-8B: Perfect for laptops and rapid prototyping.

💻 Quick Start

Installation

pip install transformers torch accelerate

Basic Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "squ11z1/Hypnos-i2-32B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

prompt = "Explain the concept of quantum regularization:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Quantized Inference (Recommended)

For consumer GPUs, use 4-bit quantization (~20GB VRAM):

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4"
)

model = AutoModelForCausalLM.from_pretrained(
    "squ11z1/hypnos-i2-32B",
    quantization_config=quantization_config,
    device_map="auto"
)

Hardware Requirements:

Full precision: 64GB VRAM (A100/H100)
4-bit quantized: 20GB VRAM (RTX 3090/4090, A6000)
RAM: 32GB+ recommended

⚛️ Quantum-Reasoning Capabilities

As a Quantum-Reasoning Engine, Hypnos-i2 transitions beyond standard text generation into high-fidelity logical simulation. Its Multi-Physical Entropy architecture enables it to excel in high-stakes, precision-critical environments:

🌌 High-Fidelity Logic Chains - Executes multi-step reasoning with "quantum" precision, maintaining coherence across long deduction paths (AIME/NuminaMath optimized).
🔬 First-Principles Modeling - Synthesizes complex scientific data into accurate explanations, treating empirical facts as immutable constraints (SciBench grounded).
🛡️ Low-Entropy Stability - Exhibits exceptional resistance to adversarial noise, prompt injection, and logical degradation, maintaining state stability.
⚡ Algorithmic Synthesis - Generates highly optimized, functional code structures, prioritizing execution efficiency over generic boilerplate (CodeForces competitive).
🌐 Cross-Domain Entanglement - Seamlessly connects concepts across 20+ languages and distinct disciplines (e.g., Physics ↔ Poetry), preserving semantic integrity.
🔮 Coherent Narrative Simulation - Generates creative outputs that adhere to strict internal logic and continuity, simulating scenarios with realistic causality.

📚 Training Details

Architecture: Qwen3-32B (32 billion parameters)
Training Method: Full fine-tuning with quantum-augmented contexts
Quantum Sources:
- IBM Quantum Heron (superconducting qubits)
- ANU QRNG (vacuum fluctuations)
- Fourmilab HotBits (radioactive decay)
Regularization: Input-level entropy injection per training example
Context Length: 32,768 tokens
Precision: BF16 training, supports INT4/INT8 quantization