u-10bei/dpo-dataset-qwen-cot
Viewer • Updated • 4.04k • 115 • 2
How to use tenyyprn/qwen3-4b-structeval-exp13 with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="tenyyprn/qwen3-4b-structeval-exp13")
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe(messages) # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("tenyyprn/qwen3-4b-structeval-exp13", dtype="auto")How to use tenyyprn/qwen3-4b-structeval-exp13 with PEFT:
Task type is invalid.
How to use tenyyprn/qwen3-4b-structeval-exp13 with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "tenyyprn/qwen3-4b-structeval-exp13"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "tenyyprn/qwen3-4b-structeval-exp13",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker model run hf.co/tenyyprn/qwen3-4b-structeval-exp13
How to use tenyyprn/qwen3-4b-structeval-exp13 with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "tenyyprn/qwen3-4b-structeval-exp13" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "tenyyprn/qwen3-4b-structeval-exp13",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "tenyyprn/qwen3-4b-structeval-exp13" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "tenyyprn/qwen3-4b-structeval-exp13",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'How to use tenyyprn/qwen3-4b-structeval-exp13 with Docker Model Runner:
docker model run hf.co/tenyyprn/qwen3-4b-structeval-exp13
This model is a fine-tuned version of Qwen/Qwen3-4B-Instruct-2507 using Direct Preference Optimization (DPO).
This repository contains a LoRA adapter trained for structured data generation tasks (JSON, YAML, TOML, XML, CSV, etc.).
Training and inference formats are fully aligned by embedding the system prompt into DPO training data, which significantly improves output quality.
| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen3-4B-Instruct-2507 + SFT (Exp5) |
| Method | DPO (Direct Preference Optimization) |
| Dataset | u-10bei/dpo-dataset-qwen-cot |
| LoRA rank (r) | 16 |
| LoRA alpha | 32 |
| Learning rate | 5e-7 |
| Epochs | 2 |
| Batch size | 4 (grad accum: 2) |
| Beta | 0.1 |
| Max length | 1024 |
| Max prompt length | 512 |
| Optimizer | AdamW |
| Warmup ratio | 0.1 |
| Seed | 3407 |
You are a structured data expert. Output the requested format directly without any explanation, preamble, or markdown code blocks. Do not write ```json, ```yaml, ```toml, ```xml, ```csv or similar. Output only the raw structured data.
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
BASE_MODEL_ID = "Qwen/Qwen3-4B-Instruct-2507"
ADAPTER_PATH = "tenyyprn/qwen3-4b-structeval-exp13"
SYSTEM_PROMPT = (
"You are a structured data expert. "
"Output the requested format directly without any explanation, "
"preamble, or markdown code blocks. "
"Do not write ```json, ```yaml, ```toml, ```xml, ```csv or similar. "
"Output only the raw structured data."
)
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL_ID, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(BASE_MODEL_ID, torch_dtype=torch.float16, device_map="auto")
model = PeftModel.from_pretrained(model, ADAPTER_PATH)
model = model.merge_and_unload()
model.eval()
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": "Convert to JSON: name=Alice, age=30, city=Tokyo"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
@inproceedings{rafailov2023direct,
title = {{Direct Preference Optimization: Your Language Model is Secretly a Reward Model}},
author = {Rafael Rafailov and Archit Sharma and Eric Mitchell and Christopher D. Manning and Stefano Ermon and Chelsea Finn},
year = 2023,
booktitle = {Advances in Neural Information Processing Systems 36},
url = {http://papers.nips.cc/paper_files/paper/2023/hash/a85b405ed65c6477a4fe8302b5e06ce7-Abstract-Conference.html},
}
Base model
Qwen/Qwen3-4B-Instruct-2507