Instructions to use squ11z1/Hypnos-i2-32B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use squ11z1/Hypnos-i2-32B with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="squ11z1/Hypnos-i2-32B", filename="Hypnos-i2-32B-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use squ11z1/Hypnos-i2-32B with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf squ11z1/Hypnos-i2-32B:Q4_K_M # Run inference directly in the terminal: llama-cli -hf squ11z1/Hypnos-i2-32B:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf squ11z1/Hypnos-i2-32B:Q4_K_M # Run inference directly in the terminal: llama-cli -hf squ11z1/Hypnos-i2-32B:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf squ11z1/Hypnos-i2-32B:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf squ11z1/Hypnos-i2-32B:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf squ11z1/Hypnos-i2-32B:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf squ11z1/Hypnos-i2-32B:Q4_K_M
Use Docker
docker model run hf.co/squ11z1/Hypnos-i2-32B:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use squ11z1/Hypnos-i2-32B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "squ11z1/Hypnos-i2-32B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "squ11z1/Hypnos-i2-32B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/squ11z1/Hypnos-i2-32B:Q4_K_M
- Ollama
How to use squ11z1/Hypnos-i2-32B with Ollama:
ollama run hf.co/squ11z1/Hypnos-i2-32B:Q4_K_M
- Unsloth Studio new
How to use squ11z1/Hypnos-i2-32B with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for squ11z1/Hypnos-i2-32B to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for squ11z1/Hypnos-i2-32B to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for squ11z1/Hypnos-i2-32B to start chatting
- Pi new
How to use squ11z1/Hypnos-i2-32B with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf squ11z1/Hypnos-i2-32B:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "squ11z1/Hypnos-i2-32B:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use squ11z1/Hypnos-i2-32B with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf squ11z1/Hypnos-i2-32B:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default squ11z1/Hypnos-i2-32B:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use squ11z1/Hypnos-i2-32B with Docker Model Runner:
docker model run hf.co/squ11z1/Hypnos-i2-32B:Q4_K_M
- Lemonade
How to use squ11z1/Hypnos-i2-32B with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull squ11z1/Hypnos-i2-32B:Q4_K_M
Run and chat with the model
lemonade run user.Hypnos-i2-32B-Q4_K_M
List all available models
lemonade list
llm.create_chat_completion(
messages = [
{
"role": "user",
"content": "What is the capital of France?"
}
]
)Hypnos-i2-32B (Multi-Source Quantum Reasoning Model)
Quantum-Reasoning Engine. The first 32B model trained on Multi-Physical Entropy (Superconductors + Vacuum + Nuclear Decay).
Built by scientists, for scientists.
🌌 Overview
Hypnos-i2-32B represents a breakthrough in language model training: the world's first 32B parameter model trained with Input-Level Quantum Regularization from three independent quantum entropy sources.
Unlike traditional LLMs that rely purely on pseudo-random noise during training, Hypnos-i2 learns from true quantum randomness extracted from:
- MATTER: Superconducting qubit decoherence (IBM Quantum Heron, 133-qubit processors)
- LIGHT: Quantum vacuum fluctuations (ANU Quantum Random Number Generator)
- NUCLEUS: Radioactive decay timing (Fourmilab HotBits, Strontium-90)
This creates attention mechanisms that are inherently robust to adversarial perturbations and resistant to mode collapse.
🚀 Key Features
- 32B Parameters — Based on Qwen3-32B architecture
- Multi-QPU Training — Three orthogonal quantum entropy sources
- Input-Level Regularization — Quantum noise embedded in training contexts
- Enhanced Robustness — Improved adversarial resistance and reduced repetition
- Production-Ready — Full fine-tuning with quantum-augmented data
📊 Performance Highlights
Core Capabilities
| Benchmark | Hypnos-i2-32B | Qwen3-32B Base | Delta |
|---|---|---|---|
| ArenaHard | 94.9 | 93.8 | +1.1 |
| AIME '24 | 86.2 | 81.4 | +4.8 |
| AIME '25 | 79.5 | 72.9 | +6.6 |
| LiveBench | 64.1 | 49.3 | +14.8 |
| CodeForces | 2045 | 1977 | +68 |
Robustness Metrics
| Benchmark | Discipline | Hypnos-i2-32B | Qwen3-32B Base | Llama-3.1-405B | Mistral-Large-2411 | Deepseek-R1 | Llama 4 Maverick |
|---|---|---|---|---|---|---|---|
| Hallucination | Safety | 2.3% | 5.9% | 5.2% | 4.5% | 14.3% | 8.2% |
Multi-Physical Entropy training drastically reduces tendency to fabricate information.
🔬 Technical Innovation: Quantum Regularization
The Problem
Traditional language models suffer from:
- Mode collapse — repetitive, looping outputs
- Adversarial vulnerability — susceptibility to prompt injection
- Overfitting — limited generalization to novel scenarios
The Solution
Input-Level Quantum Entropy Injection works as follows:
- Quantum Sampling: Before each training batch, unique entropy sequences are drawn from all three quantum sources
- Context Augmentation: These sequences are embedded into the context window of training examples
- Attention Learning: The model learns to distinguish signal (reasoning patterns) from quantum noise
- Emergent Robustness: Attention heads develop resistance to high-entropy perturbations
This creates a regularization effect similar to Dropout, but data-driven and grounded in fundamental physics rather than architecture hacks.
Why Three Quantum Sources?
Each source provides entropy with distinct temporal characteristics:
- Superconducting qubits (microsecond coherence) → fast-frequency robustness
- Vacuum fluctuations (nanosecond EM noise) → high-frequency filtering
- Radioactive decay (Poissonian distribution) → deep unpredictability patterns
Combined, they create multi-scale regularization impossible to achieve with classical pseudo-random generators.
🧬 The Hypnos Family
| Model | Parameters | Quantum Sources | Best For | Status |
|---|---|---|---|---|
| Hypnos-Colossus-1T | 1T (MoE) | 3 (IBM + IQM + Cosmic) | Deep Simulation, Grand Challenges | 🌌 Flagship |
| Hypnos-i2-32B | 32B | 3 (Matter + Light + Nucleus) | Production, Research | ✅ Stable |
| Hypnos-i1-8B | 8B | 1 (Matter only) | Edge, Experiments | ✅ 10k+ Downloads |
Which one to choose?
- Colossus 1T: For when you need maximum reasoning depth.
- i2-32B: The "Giant Killer" - best balance of logic and efficiency for consumer GPUs.
- i1-8B: Perfect for laptops and rapid prototyping.
💻 Quick Start
Installation
pip install transformers torch accelerate
Basic Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_name = "squ11z1/Hypnos-i2-32B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto"
)
prompt = "Explain the concept of quantum regularization:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
top_p=0.9,
do_sample=True
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Quantized Inference (Recommended)
For consumer GPUs, use 4-bit quantization (~20GB VRAM):
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4"
)
model = AutoModelForCausalLM.from_pretrained(
"squ11z1/hypnos-i2-32B",
quantization_config=quantization_config,
device_map="auto"
)
Hardware Requirements:
- Full precision: 64GB VRAM (A100/H100)
- 4-bit quantized: 20GB VRAM (RTX 3090/4090, A6000)
- RAM: 32GB+ recommended
⚛️ Quantum-Reasoning Capabilities
As a Quantum-Reasoning Engine, Hypnos-i2 transitions beyond standard text generation into high-fidelity logical simulation. Its Multi-Physical Entropy architecture enables it to excel in high-stakes, precision-critical environments:
- 🌌 High-Fidelity Logic Chains - Executes multi-step reasoning with "quantum" precision, maintaining coherence across long deduction paths (AIME/NuminaMath optimized).
- 🔬 First-Principles Modeling - Synthesizes complex scientific data into accurate explanations, treating empirical facts as immutable constraints (SciBench grounded).
- 🛡️ Low-Entropy Stability - Exhibits exceptional resistance to adversarial noise, prompt injection, and logical degradation, maintaining state stability.
- ⚡ Algorithmic Synthesis - Generates highly optimized, functional code structures, prioritizing execution efficiency over generic boilerplate (CodeForces competitive).
- 🌐 Cross-Domain Entanglement - Seamlessly connects concepts across 20+ languages and distinct disciplines (e.g., Physics ↔ Poetry), preserving semantic integrity.
- 🔮 Coherent Narrative Simulation - Generates creative outputs that adhere to strict internal logic and continuity, simulating scenarios with realistic causality.
📚 Training Details
- Architecture: Qwen3-32B (32 billion parameters)
- Training Method: Full fine-tuning with quantum-augmented contexts
- Quantum Sources:
- IBM Quantum Heron (superconducting qubits)
- ANU QRNG (vacuum fluctuations)
- Fourmilab HotBits (radioactive decay)
- Regularization: Input-level entropy injection per training example
- Context Length: 32,768 tokens
- Precision: BF16 training, supports INT4/INT8 quantization
🙏 Acknowledgments
- IBM Quantum — Superconducting qubit entropy access
- ANU Centre for Quantum Computation — Vacuum fluctuation QRNG
- Fourmilab — Radioactive decay entropy (HotBits)
Special thanks to 1,000+ Hypnos-i1 users for feedback!
📜 License
Apache 2.0 — Commercial use permitted with attribution.
- Downloads last month
- 170
4-bit
16-bit
Model tree for squ11z1/Hypnos-i2-32B
Base model
Qwen/Qwen3-32B
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="squ11z1/Hypnos-i2-32B", filename="", )