Instructions to use QuantaSparkLabs/NYXIS-1.1B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use QuantaSparkLabs/NYXIS-1.1B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="QuantaSparkLabs/NYXIS-1.1B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("QuantaSparkLabs/NYXIS-1.1B") model = AutoModelForCausalLM.from_pretrained("QuantaSparkLabs/NYXIS-1.1B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use QuantaSparkLabs/NYXIS-1.1B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "QuantaSparkLabs/NYXIS-1.1B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "QuantaSparkLabs/NYXIS-1.1B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/QuantaSparkLabs/NYXIS-1.1B
- SGLang
How to use QuantaSparkLabs/NYXIS-1.1B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "QuantaSparkLabs/NYXIS-1.1B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "QuantaSparkLabs/NYXIS-1.1B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "QuantaSparkLabs/NYXIS-1.1B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "QuantaSparkLabs/NYXIS-1.1B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio new
How to use QuantaSparkLabs/NYXIS-1.1B with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for QuantaSparkLabs/NYXIS-1.1B to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for QuantaSparkLabs/NYXIS-1.1B to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for QuantaSparkLabs/NYXIS-1.1B to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="QuantaSparkLabs/NYXIS-1.1B", max_seq_length=2048, ) - Docker Model Runner
How to use QuantaSparkLabs/NYXIS-1.1B with Docker Model Runner:
docker model run hf.co/QuantaSparkLabs/NYXIS-1.1B
NYXIS-1.1B — Identity-Aligned Lightweight Language Model by QuantaSparkLabs
All New NYXIS 2B!
This repository contains the fully merged model weights (not just LoRA adapters),
compatible with 🤗 Transformers, vLLM, Text Generation Inference, Unsloth, and custom pipelines. Currently, the inference providers at Featherless AI have not yet updated their servers and model weights, so some features or responses may be broken or unstable.
📋 Overview
NYXIS-1.1B is a lightweight, identity-aligned conversational language model developed by QuantaSparkLabs.
It is fine-tuned from Qwen2.5-1.5B-Instruct using QLoRA + Unsloth on a custom curated dataset — built entirely on a T4 GPU.
NYXIS is designed for stable persona consistency, instruction following, web-search tool calling, and efficient edge deployment — all while keeping a tiny VRAM footprint.
🎯 Design Goals
| 🎯 Goal | 📌 Detail |
|---|---|
| 🪪 Identity Alignment | Consistent "I'm NYXIS, created by QuantaSparkLabs" across all contexts |
| 🌐 Tool Calling | Trained web-search function-call pattern built in |
| ⚡ Efficiency | Runs on T4 / 8GB VRAM without quantization tricks |
| 🔧 Plug & Play | Fully merged weights — no adapter loading needed |
| 🧠 Knowledge Retention | Custom dataset preserves Qwen2.5 base knowledge |
✨ Core Capabilities
| Capability | Description |
|---|---|
| 🧠 Conversational AI | Chat-optimized with Qwen2.5 <|im_start|> / <|im_end|> template |
| 🪪 Identity Alignment | Consistent "NYXIS by QuantaSparkLabs" persona under all prompts |
| 📚 Instruction Following | Supports reasoning, explanation, summarization, and coding |
| 🌐 Web Search Tool | Emits web_search(query) function calls when external info is needed |
| ⚡ Lightweight | Runs on 6–8 GB VRAM in FP16 |
| 🔧 Fully Merged Weights | Standalone model — no LoRA adapter required at runtime |
🏗️ Model Architecture
🔩 Base Model
| Field | Value |
|---|---|
| Backbone | Qwen/Qwen2.5-1.5B-Instruct |
| Framework | Hugging Face Transformers + Unsloth |
| Fine-tuning | QLoRA (rank 16) → Full Weight Merge |
| Chat Template | Qwen2.5 ChatML (<|im_start|> / <|im_end|>) |
🔄 Training Pipeline
Qwen2.5-1.5B-Instruct (Base)
↓
QLoRA Fine-Tuning
(rank 16, Unsloth)
↓
Custom 500-example
Identity + Chat + Tool Dataset
↓
Full Weight Merge
(adapter baked into model)
↓
NYXIS-1.1B — Deployed on HuggingFace 🚀
📊 Technical Specifications
| ⚙️ Parameter | 📌 Value |
|---|---|
| Model Name | NYXIS-1.1B |
| Organization | QuantaSparkLabs |
| Base Model | Qwen/Qwen2.5-1.5B-Instruct |
| Total Parameters | ~1.56 Billion |
| Trainable Parameters | 18.5M (1.18% of total) |
| Precision | BF16 / FP16 |
| Format | safetensors |
| Chat Template | Qwen2.5 ChatML (Jinja) |
| Inference Mode | Causal LM |
| File Size | ~2.0–2.2 GB |
🧬 Training Details
⚡ Fine-Tuning Method
| 🔬 Setting | 📌 Value |
|---|---|
| Technique | QLoRA (Quantized Low-Rank Adaptation) |
| Library | Unsloth |
| LoRA Rank | 16 |
| Optimizer | AdamW (paged) |
| Learning Rate | 2e-4 |
| Epochs | 3 |
| Total Steps | 189 |
| Batch Size | 8 (2 per device × 4 grad accumulation) |
| Hardware | T4 GPU |
| Final Training Loss | ~0.08 ✅ |
| Merge Strategy | Full weight merge — adapter baked in |
📂 Dataset Composition (500 examples)
| 🗂️ Category | 📊 Proportion | 📝 Description |
|---|---|---|
| 🪪 Identity | 10% (50 examples) | Gives its Identity |
| 💬 Open Chat | 70% (350 examples) | Diverse assistant responses — science, jokes, coding, daily life, etc. |
| 🌐 Web Search Tool | 20% (100 examples) | Function-calling pattern: model requests web_search(query) when it needs external info |
The dataset was custom-built to preserve Qwen2.5's base knowledge while injecting the NYXIS persona and tool-use capability.
💻 Quick Start
🔧 Installation
# Option A: Standard Transformers
pip install transformers accelerate torch
# Option B: Unsloth (recommended for speed + memory efficiency)
pip install unsloth
🚀 Load & Chat — Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
MODEL_ID = "QuantaSparkLabs/NYXIS-1.1B"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
MODEL_ID,
torch_dtype=torch.float16,
device_map="auto"
)
model.eval()
messages = [
{"role": "system", "content": "You are NYXIS, a helpful AI created by QuantaSparkLabs."},
{"role": "user", "content": "Hello NYXIS! Who are you?"}
]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=150,
temperature=0.6,
top_p=0.9,
repetition_penalty=1.15,
no_repeat_ngram_size=3,
pad_token_id=tokenizer.eos_token_id,
eos_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(
outputs[0][inputs["input_ids"].shape[1]:],
skip_special_tokens=True
)
print("NYXIS:", response)
⚡ Load with Unsloth (Recommended)
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="QuantaSparkLabs/NYXIS-1.1B",
max_seq_length=2048,
load_in_4bit=True,
)
FastLanguageModel.for_inference(model)
🖊️ Manual Qwen2.5 Chat Prompt Format
NYXIS uses the standard Qwen2.5 ChatML tokens. Build your prompt manually like this:
messages = [
{"role": "system", "content": "You are NYXIS, a helpful AI created by QuantaSparkLabs."},
{"role": "user", "content": "What is a black hole?"}
]
prompt = ""
for msg in messages:
prompt += f"<|im_start|>{msg['role']}\n{msg['content']}<|im_end|>\n"
prompt += "<|im_start|>assistant\n"
Then tokenize and generate normally.
🌐 Web Search Tool Pattern
When a system prompt mentions that a web_search tool is available, NYXIS may emit a function call instead of answering directly:
<|im_start|>assistant
[{"type": "function", "function": {"name": "web_search", "arguments": {"query": "latest news on AI"}}}]
<|im_end|>
You can intercept this, run an actual search, and feed the result back as a tool message to get the final answer.
⚠️ The web-search pattern is trained behaviour only — it does not include a live search engine.
You need to implement the tool runner yourself (e.g. using SerpAPI, DuckDuckGo, or Tavily).
⚡ Hardware Requirements
| 🖥️ Hardware | 🚦 Performance |
|---|---|
| T4 GPU (16GB) | ✅ Optimal — trained on this |
| RTX 3060 (12GB) | ✅ Smooth FP16 |
| 8GB VRAM GPU | ⚠️ Usable — FP16 recommended |
| 4GB VRAM GPU | 🔶 Use 4-bit via Unsloth / BitsAndBytes |
| CPU Only | 🐌 Slow but functional |
📁 Repository Structure
NYXIS-1.1B/
├── model.safetensors # Full merged weights (~2.2 GB)
├── config.json # Model architecture config
├── tokenizer.json # Qwen2.5 tokenizer
├── tokenizer_config.json # Chat template config
├── generation_config.json # Default generation settings
├── chat_template.jinja # Jinja chat template
└── README.md
⚠️ Known Limitations
| ⚠️ Issue | 📝 Notes |
|---|---|
| 🔁 Hallucination | May occasionally hallucinate or oversimplify (1.5B scale) |
| 🗣️ Identity Bias | May append "How can I help you today?" — reduce via system prompt tuning |
| 🔢 Math Reasoning | Limited complex math ability (small model) |
| 🌍 Language | Primarily English-focused |
| 🚫 Critical Use | Not suitable for medical, legal, or safety-critical applications |
| 🔍 Web Search | Tool pattern only — no live search engine included |
🔒 Safety & Alignment
NYXIS is trained with:
- ✅ Identity alignment dataset (consistent persona)
- ✅ Instruction-balanced samples (diverse and safe)
- ✅ Controlled decoding configuration (anti-loop)
Recommended generation settings:
temperature = 0.6
top_p = 0.9
repetition_penalty = 1.1 # to 1.2
no_repeat_ngram_size = 3
🚀 Version History
| 🏷️ Version | 📅 Date | 📝 Notes |
|---|---|---|
| v1.0 | Early 2025 | Initial LoRA fine-tune on TinyLlama |
| v1.1 (NYXIS 2.1) | 2025 | Rebuilt on Qwen2.5-1.5B-Instruct · QLoRA · Unsloth · 500 examples · Web-search tool · Full merge · HF deployment |
📜 License
This model is licensed under the Apache 2.0 License,
following the original Qwen2.5-1.5B-Instruct license terms.
NYXIS • Built by QuantaSparkLabs • 2025–2026
Lightweight • Identity-Aligned • Efficient • Open Source
If you find NYXIS useful, give the repo a ❤️ and share your creations!
- Downloads last month
- 379
Model tree for QuantaSparkLabs/NYXIS-1.1B
Base model
Qwen/Qwen2.5-1.5B