SteraVibeThinker

A full fine-tune of WeiboAI/VibeThinker-3B (a 3B reasoning model built on the Qwen2.5-3B / Qwen2.5-Coder-3B architecture) on the ~30k-example Tiny-Giant agentic tool-use dataset.

The goal: keep VibeThinker's strong verifiable-reasoning core while teaching it the deterministic, Hermes/ChatML-style <tool_call> agent format used by the Tiny-Giant harness.

Files

File Description
SteraVibeThinker-Q4_K_M.gguf Q4_K_M quantization (~1.8 GB) โ€” for llama.cpp / Ollama / LM Studio
SteraVibeThinker-f16.gguf f16 GGUF (~5.8 GB) โ€” re-quantize to any level without retraining
raw_weights/ Full bf16 safetensors HF checkpoint
val_meta.jsonl Held-out validation set shipped with the model

Training

  • Base: WeiboAI/VibeThinker-3B (MIT, Qwen2.5-3B architecture, ChatML-native)
  • Method: full fine-tune (not LoRA), bf16 + gradient checkpointing
  • Data: ~30k Tiny-Giant agentic tool-use conversations
  • Epochs: 2 ยท LR: 7e-6 (cosine, 3% warmup) ยท Seq len: 4096
  • Loss: full-sequence (tool results modeled as in-distribution context)

Prompt format

This model was trained with an explicit ChatML / Hermes renderer, not tokenizer.apply_chat_template. Pin the ChatML template explicitly when serving โ€” do not rely on auto-detection. Tool calls use:

<tool_call>
{"name": "<function-name>", "arguments": {...}}
</tool_call>

Inference (llama.cpp)

llama-cli -m SteraVibeThinker-Q4_K_M.gguf --chat-template chatml

License

MIT, inherited from the VibeThinker-3B base model.

Downloads last month
59
GGUF
Model size
3B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

4-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for strykes/SteraVibeThinker

Base model

Qwen/Qwen2.5-3B
Quantized
(40)
this model