Safetensors
Vietnamese
dpo
alignment
vietnamese
lora
unsloth

lab22-dpo-vn

DPO-aligned LoRA adapter for Qwen2.5-3B — Lab 22 AICB-P2T3 VinUniversity A20 2026.

Student: Ho Dac Toan (2A202600057)

Training

Stage Details
Base unsloth/Qwen2.5-3B-bnb-4bit (NF4 4-bit)
SFT saillab/alpaca-vietnamese-cleaned · 1 000 samples · 1 epoch · LoRA r=16 α=32
DPO argilla/ultrafeedback-binarized-preferences-cleaned · 2 000 pairs · 1 epoch
beta 0.1
lr 5e-07
Reward gap (end) 0.20582379102706905

Evaluation

Benchmark SFT-only SFT+DPO Δ
IFEval / GSM8K / MMLU / AlpacaEval-lite run NB6 to populate

Usage

from peft import PeftModel
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
    "unsloth/Qwen2.5-3B-bnb-4bit", load_in_4bit=True
)
model = PeftModel.from_pretrained(model, "dactoan123/lab22-dpo-vn")
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dactoan123/lab22-dpo-vn

Base model

Qwen/Qwen2.5-3B
Adapter
(46)
this model

Datasets used to train dactoan123/lab22-dpo-vn