mxz
/

llama3-8b-dpo

Text Generation

text-generation-inference

Model card Files Files and versions

dataset Intruction

datasets:
- mxz/CValues_DPO
language:
- zh
- en
metrics:
- perplexity
pipeline_tag:
- text-generation
tags:
- DPO
- fintune
- alignment
- LoRA
- Llama-3

About mxz-llama-3-8B-sft

This model trained by SFT and PPO.

It's have coding, reasoing, chinese QA .

evaluation

Result:

Model	MMLU	C-EVAL	C-MMLU
Llama-3-8B	55.5	47.0	48.0
Llama-3-8B-Instruct	60.1	49.7	49.3
Llama-3-8B-dpo	62.2	49.9	49.4

Llama-3-8B evaluation result from ymcui/Chinese-LLaMA-Alpaca-3

Downloads last month: 3

Safetensors

Model size

8B params

Tensor type

F32

·

BF16

·