Instructions to use WeiboAI/VibeThinker-3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use WeiboAI/VibeThinker-3B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="WeiboAI/VibeThinker-3B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("WeiboAI/VibeThinker-3B")
model = AutoModelForMultimodalLM.from_pretrained("WeiboAI/VibeThinker-3B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use WeiboAI/VibeThinker-3B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "WeiboAI/VibeThinker-3B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WeiboAI/VibeThinker-3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/WeiboAI/VibeThinker-3B

SGLang

How to use WeiboAI/VibeThinker-3B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "WeiboAI/VibeThinker-3B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WeiboAI/VibeThinker-3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "WeiboAI/VibeThinker-3B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WeiboAI/VibeThinker-3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use WeiboAI/VibeThinker-3B with Docker Model Runner:
```
docker model run hf.co/WeiboAI/VibeThinker-3B
```

这模型几乎通过了我所有的本地推理测试，很强，唯一做不出来的问题我贴在下面

by pypry - opened 2 days ago

Discussion

pypry

2 days ago

5个人来自不同地方，住不同房子，养不同动物，吸不同牌子香烟，喝不同饮料，喜欢不同食物。根据以下线索确定养猫的人来自哪里。
1．红房子在蓝房子的右边，白房子的左边（不一定紧邻）
2．黄房子的主人来自香港，而且他的房子不在最左边。
3．爱吃比萨的人住在爱喝矿泉水的人的隔壁。
4．来自北京的人爱喝茅台，住在来自上海的人的隔壁。
5．吸希尔顿香烟的人住在养马人的右边隔壁。
6．爱喝啤酒的人也爱吃鸡。
7．绿房子的人养狗。
8．爱吃面条的人住在养蛇人的隔壁。
9．来自天津的人的邻居（紧邻）一个爱吃牛肉，另一个来自成都。
10．养鱼的人住在最右边的房子里。
11．吸万宝路香烟的人住在吸希尔顿香烟的人和吸“555”香烟的人的中间（紧邻）
12．红房子的人爱喝茶。
13．爱喝葡萄酒的人住在爱吃豆腐的人的右边隔壁。
14．吸红塔山香烟的人既不住在吸健牌香烟的人的隔壁，也不与来自上海的人相邻。
15．来自上海的人住在左数第二间房子里。
16．爱喝矿泉水的人住在最中间的房子里。
17．爱吃面条的人也爱喝葡萄酒。
18．吸“555”香烟的人比吸希尔顿香烟的人住的靠右