Instructions to use QuantaSparkLabs/NYXIS-1.1B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use QuantaSparkLabs/NYXIS-1.1B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="QuantaSparkLabs/NYXIS-1.1B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("QuantaSparkLabs/NYXIS-1.1B")
model = AutoModelForCausalLM.from_pretrained("QuantaSparkLabs/NYXIS-1.1B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use QuantaSparkLabs/NYXIS-1.1B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "QuantaSparkLabs/NYXIS-1.1B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuantaSparkLabs/NYXIS-1.1B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/QuantaSparkLabs/NYXIS-1.1B

SGLang

How to use QuantaSparkLabs/NYXIS-1.1B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "QuantaSparkLabs/NYXIS-1.1B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuantaSparkLabs/NYXIS-1.1B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "QuantaSparkLabs/NYXIS-1.1B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuantaSparkLabs/NYXIS-1.1B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio new

How to use QuantaSparkLabs/NYXIS-1.1B with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for QuantaSparkLabs/NYXIS-1.1B to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for QuantaSparkLabs/NYXIS-1.1B to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for QuantaSparkLabs/NYXIS-1.1B to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="QuantaSparkLabs/NYXIS-1.1B",
    max_seq_length=2048,
)

Docker Model Runner
How to use QuantaSparkLabs/NYXIS-1.1B with Docker Model Runner:
```
docker model run hf.co/QuantaSparkLabs/NYXIS-1.1B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

A newer version of this model is available: QuantaSparkLabs/NYXIS-Pro

NYXIS-1.1B — Identity-Aligned Lightweight Language Model by QuantaSparkLabs

All New NYXIS 2B!

This repository contains the fully merged model weights (not just LoRA adapters),
compatible with 🤗 Transformers, vLLM, Text Generation Inference, Unsloth, and custom pipelines. Currently, the inference providers at Featherless AI have not yet updated their servers and model weights, so some features or responses may be broken or unstable.

📋 Overview

NYXIS-1.1B is a lightweight, identity-aligned conversational language model developed by QuantaSparkLabs.
It is fine-tuned from Qwen2.5-1.5B-Instruct using QLoRA + Unsloth on a custom curated dataset — built entirely on a T4 GPU.

NYXIS is designed for stable persona consistency, instruction following, web-search tool calling, and efficient edge deployment — all while keeping a tiny VRAM footprint.

🎯 Design Goals

🎯 Goal	📌 Detail
🪪 Identity Alignment	Consistent "I'm NYXIS, created by QuantaSparkLabs" across all contexts
🌐 Tool Calling	Trained web-search function-call pattern built in
⚡ Efficiency	Runs on T4 / 8GB VRAM without quantization tricks
🔧 Plug & Play	Fully merged weights — no adapter loading needed
🧠 Knowledge Retention	Custom dataset preserves Qwen2.5 base knowledge

✨ Core Capabilities

Capability	Description
🧠 Conversational AI	Chat-optimized with Qwen2.5 `<\|im_start\|>` / `<\|im_end\|>` template
🪪 Identity Alignment	Consistent "NYXIS by QuantaSparkLabs" persona under all prompts
📚 Instruction Following	Supports reasoning, explanation, summarization, and coding
🌐 Web Search Tool	Emits `web_search(query)` function calls when external info is needed
⚡ Lightweight	Runs on 6–8 GB VRAM in FP16
🔧 Fully Merged Weights	Standalone model — no LoRA adapter required at runtime

🏗️ Model Architecture

🔩 Base Model

Field	Value
Backbone	`Qwen/Qwen2.5-1.5B-Instruct`
Framework	Hugging Face Transformers + Unsloth
Fine-tuning	QLoRA (rank 16) → Full Weight Merge
Chat Template	Qwen2.5 ChatML (`<\|im_start\|>` / `<\|im_end\|>`)

🔄 Training Pipeline

Qwen2.5-1.5B-Instruct (Base)
        ↓
  QLoRA Fine-Tuning
  (rank 16, Unsloth)
        ↓
  Custom 500-example
  Identity + Chat + Tool Dataset
        ↓
  Full Weight Merge
  (adapter baked into model)
        ↓
  NYXIS-1.1B — Deployed on HuggingFace 🚀

📊 Technical Specifications

⚙️ Parameter	📌 Value
Model Name	NYXIS-1.1B
Organization	QuantaSparkLabs
Base Model	`Qwen/Qwen2.5-1.5B-Instruct`
Total Parameters	~1.56 Billion
Trainable Parameters	18.5M (1.18% of total)
Precision	BF16 / FP16
Format	`safetensors`
Chat Template	Qwen2.5 ChatML (Jinja)
Inference Mode	Causal LM
File Size	~2.0–2.2 GB

🧬 Training Details

⚡ Fine-Tuning Method

🔬 Setting	📌 Value
Technique	QLoRA (Quantized Low-Rank Adaptation)
Library	Unsloth
LoRA Rank	16
Optimizer	AdamW (paged)
Learning Rate	`2e-4`
Epochs	3
Total Steps	189
Batch Size	8 (2 per device × 4 grad accumulation)
Hardware	T4 GPU
Final Training Loss	~0.08 ✅
Merge Strategy	Full weight merge — adapter baked in

📂 Dataset Composition (500 examples)

🗂️ Category	📊 Proportion	📝 Description
🪪 Identity	10% (50 examples)	Gives its Identity
💬 Open Chat	70% (350 examples)	Diverse assistant responses — science, jokes, coding, daily life, etc.
🌐 Web Search Tool	20% (100 examples)	Function-calling pattern: model requests `web_search(query)` when it needs external info

The dataset was custom-built to preserve Qwen2.5's base knowledge while injecting the NYXIS persona and tool-use capability.

💻 Quick Start

🔧 Installation

# Option A: Standard Transformers
pip install transformers accelerate torch

# Option B: Unsloth (recommended for speed + memory efficiency)
pip install unsloth

🚀 Load & Chat — Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

MODEL_ID = "QuantaSparkLabs/NYXIS-1.1B"

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.float16,
    device_map="auto"
)
model.eval()

messages = [
    {"role": "system", "content": "You are NYXIS, a helpful AI created by QuantaSparkLabs."},
    {"role": "user", "content": "Hello NYXIS! Who are you?"}
]

prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=150,
        temperature=0.6,
        top_p=0.9,
        repetition_penalty=1.15,
        no_repeat_ngram_size=3,
        pad_token_id=tokenizer.eos_token_id,
        eos_token_id=tokenizer.eos_token_id
    )

response = tokenizer.decode(
    outputs[0][inputs["input_ids"].shape[1]:],
    skip_special_tokens=True
)
print("NYXIS:", response)

⚡ Load with Unsloth (Recommended)

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="QuantaSparkLabs/NYXIS-1.1B",
    max_seq_length=2048,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

🖊️ Manual Qwen2.5 Chat Prompt Format

NYXIS uses the standard Qwen2.5 ChatML tokens. Build your prompt manually like this:

messages = [
    {"role": "system", "content": "You are NYXIS, a helpful AI created by QuantaSparkLabs."},
    {"role": "user", "content": "What is a black hole?"}
]

prompt = ""
for msg in messages:
    prompt += f"<|im_start|>{msg['role']}\n{msg['content']}<|im_end|>\n"
prompt += "<|im_start|>assistant\n"

Then tokenize and generate normally.

🌐 Web Search Tool Pattern

When a system prompt mentions that a web_search tool is available, NYXIS may emit a function call instead of answering directly:

<|im_start|>assistant
[{"type": "function", "function": {"name": "web_search", "arguments": {"query": "latest news on AI"}}}]
<|im_end|>

You can intercept this, run an actual search, and feed the result back as a tool message to get the final answer.

⚠️ The web-search pattern is trained behaviour only — it does not include a live search engine.
You need to implement the tool runner yourself (e.g. using SerpAPI, DuckDuckGo, or Tavily).

⚡ Hardware Requirements

🖥️ Hardware	🚦 Performance
T4 GPU (16GB)	✅ Optimal — trained on this
RTX 3060 (12GB)	✅ Smooth FP16
8GB VRAM GPU	⚠️ Usable — FP16 recommended
4GB VRAM GPU	🔶 Use 4-bit via Unsloth / BitsAndBytes
CPU Only	🐌 Slow but functional

📁 Repository Structure

NYXIS-1.1B/
├── model.safetensors        # Full merged weights (~2.2 GB)
├── config.json              # Model architecture config
├── tokenizer.json           # Qwen2.5 tokenizer
├── tokenizer_config.json    # Chat template config
├── generation_config.json   # Default generation settings
├── chat_template.jinja      # Jinja chat template
└── README.md

⚠️ Known Limitations

⚠️ Issue	📝 Notes
🔁 Hallucination	May occasionally hallucinate or oversimplify (1.5B scale)
🗣️ Identity Bias	May append "How can I help you today?" — reduce via system prompt tuning
🔢 Math Reasoning	Limited complex math ability (small model)
🌍 Language	Primarily English-focused
🚫 Critical Use	Not suitable for medical, legal, or safety-critical applications
🔍 Web Search	Tool pattern only — no live search engine included

🔒 Safety & Alignment

NYXIS is trained with:

✅ Identity alignment dataset (consistent persona)
✅ Instruction-balanced samples (diverse and safe)
✅ Controlled decoding configuration (anti-loop)

Recommended generation settings:

temperature = 0.6
top_p = 0.9
repetition_penalty = 1.1  # to 1.2
no_repeat_ngram_size = 3

🚀 Version History

🏷️ Version	📅 Date	📝 Notes
v1.0	Early 2025	Initial LoRA fine-tune on TinyLlama
v1.1 (NYXIS 2.1)	2025	Rebuilt on Qwen2.5-1.5B-Instruct · QLoRA · Unsloth · 500 examples · Web-search tool · Full merge · HF deployment

📜 License

This model is licensed under the Apache 2.0 License,
following the original Qwen2.5-1.5B-Instruct license terms.

NYXIS • Built by QuantaSparkLabs • 2025–2026
_{Lightweight • Identity-Aligned • Efficient • Open Source}

If you find NYXIS useful, give the repo a ❤️ and share your creations!

Downloads last month: 379

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for QuantaSparkLabs/NYXIS-1.1B

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-1.5B-Instruct