Instructions to use LumiVore/lumivore-1.2b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use LumiVore/lumivore-1.2b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="LumiVore/lumivore-1.2b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("LumiVore/lumivore-1.2b")
model = AutoModelForCausalLM.from_pretrained("LumiVore/lumivore-1.2b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use LumiVore/lumivore-1.2b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "LumiVore/lumivore-1.2b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LumiVore/lumivore-1.2b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/LumiVore/lumivore-1.2b

SGLang

How to use LumiVore/lumivore-1.2b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "LumiVore/lumivore-1.2b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LumiVore/lumivore-1.2b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "LumiVore/lumivore-1.2b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LumiVore/lumivore-1.2b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use LumiVore/lumivore-1.2b with Docker Model Runner:
```
docker model run hf.co/LumiVore/lumivore-1.2b
```

LumiVore-1.2B

LumiVore-1.2B is a Mixture-of-Experts (MoE) language model fine-tuned for agentic workflows and conversational AI. Trained entirely on consumer hardware (AMD RX 7600 XT 16GB), it demonstrates that capable language models can be developed without datacenter-scale resources.

Model Details

Attribute	Value
Architecture	Mixture-of-Experts (DeepSeek-MoE style)
Base Model	Qwen2.5-0.5B-Instruct
Total Parameters	1.36B
Active Parameters	~610M per token (top-2 routing)
Experts	8 (1 shared + 7 routed)
MoE Layers	8 of 24 transformer layers
Context Length	2048 tokens
Precision	bfloat16

Architecture

LumiVore-1.2B uses a Mixture-of-Experts architecture with:

8 experts total: 1 shared expert always active + 7 routed experts
Top-2 routing: For each token, the router selects 2 experts (1 shared + 1 routed)
Sparse activation: Only ~610M parameters are active per token despite 1.36B total
Load balancing: Auxiliary losses ensure even expert utilization

This design provides the capacity of a larger model with the inference cost of a smaller one.

Training

Stage 1: Capability Building

Dataset: TerminalTrajectories + OpenThoughts (~11,600 examples)
Method: Full fine-tuning with LoRA on routing layers
Duration: ~5.4 hours
Goal: General agent capabilities, tool use, reasoning

Stage 2: Domain Adaptation

Dataset: OpenClaw agent-specific data (~11,900 examples)
Method: LoRA fine-tuning (rank=64, attention + routing)
Duration: ~5 hours
Goal: OpenClaw ecosystem specialization

Hardware

GPU: AMD RX 7600 XT (16GB VRAM)
Framework: PyTorch with ROCm
Optimizer: 8-bit AdamW
Total Training Time: ~10 hours

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "LumiVore/lumivore-1.2b"

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)

prompt = "You are a helpful AI assistant.

User: Hello!
Assistant:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.7, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Limitations

Small base: Built on Qwen2.5-0.5B — foundational limitations apply
Training scale: 23K examples vs. millions for production models
Identity: May occasionally claim to be other models (GPT-4, Qwen, etc.)
Verbosity: Can be verbose; use system prompts to guide conciseness
No RLHF: No reinforcement learning from human feedback

Evaluation

This model prioritizes:

✅ Agentic tool use — calling functions, following patterns
✅ Structured outputs — JSON, markdown, code
✅ Conversational flow — turn-taking, context tracking
⚠️ Creative writing — not a primary training objective
❌ Factual knowledge — limited by base model size

Resources

Resource	Link
GitHub	https://github.com/dansan-claw/lumivore
Website	https://lumivore.ai
Discord	https://discord.gg/M7U8JCUukD
Datasets	See LumiVore organization

Citation

@misc{lumivore-1.2b,
  title={LumiVore-1.2B: A Mixture-of-Experts Model for Agentic AI},
  author={van Eek, Daniel},
  year={2026},
  url={https://huggingface.co/LumiVore/lumivore-1.2b}
}

License

Apache 2.0 — use it, modify it, ship it in your products.

LumiVore AI explores the future of intelligent systems — building AI that is efficient, adaptable, and accessible.

Downloads last month: 5

Safetensors

Model size

1B params

Tensor type

F32

BF16

Model tree for LumiVore/lumivore-1.2b

Base model

Qwen/Qwen2.5-0.5B

Finetuned

Qwen/Qwen2.5-0.5B-Instruct