Text Generation
Transformers
Safetensors
qwen3_5_text
qwen
qwen3.5
reasoning
distillation
claude-opus
darwin-v8
sft
lora
merged
conversational
Instructions to use FINAL-Bench/lastbrain with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use FINAL-Bench/lastbrain with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="FINAL-Bench/lastbrain") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("FINAL-Bench/lastbrain") model = AutoModelForCausalLM.from_pretrained("FINAL-Bench/lastbrain") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use FINAL-Bench/lastbrain with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "FINAL-Bench/lastbrain" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FINAL-Bench/lastbrain", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/FINAL-Bench/lastbrain
- SGLang
How to use FINAL-Bench/lastbrain with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "FINAL-Bench/lastbrain" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FINAL-Bench/lastbrain", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "FINAL-Bench/lastbrain" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FINAL-Bench/lastbrain", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use FINAL-Bench/lastbrain with Docker Model Runner:
docker model run hf.co/FINAL-Bench/lastbrain
Add README: Darwin V8 lastbrain (Qwen3.5-2B father + Opus-Distill LoRA mother merged)
7a96f3e verified metadata
license: apache-2.0
base_model: Qwen/Qwen3.5-2B
tags:
- qwen
- qwen3.5
- reasoning
- distillation
- claude-opus
- darwin-v8
- sft
- lora
- merged
language:
- en
- ko
- zh
- ja
pipeline_tag: text-generation
library_name: transformers
๐ง lastbrain โ Darwin V8
Darwin V8 ๊ธฐ๋ฐ Claude Opus ์ฆ๋ฅ ๋ชจ๋ธ (2B ํ๋ผ๋ฏธํฐ)
- ๐จ Father (Base):
Qwen/Qwen3.5-2B - ๐ฉ Mother (LoRA Adapter):
FINAL-Bench/Qwen3.5-2B-Opus-Distill-v1 - ๐ถ Child (This model):
FINAL-Bench/lastbrainโ merged full-weight standalone
๐ฆ ํน์ง
- Base: Qwen3.5-2B (2.3B ํ๋ผ๋ฏธํฐ, ํ์ด๋ธ๋ฆฌ๋ ์ดํ ์ )
- Training: SFT + LoRA (
all-linear, rank=16, ฮฑ=32) - Teachers: Claude Opus 4.5 / 4.6, Claude Sonnet 4.6 (pre-generated reasoning traces)
- Data: 4,451 ๊ณ ํ์ง ์ถ๋ก ๊ถค์ (4๊ฐ ๊ณต๊ฐ ๋ฐ์ดํฐ์ )
- Merged: LoRA ์ด๋ํฐ๊ฐ base ๊ฐ์ค์น์ ์์ ํตํฉ๋์ด ๋ ๋ฆฝ ์คํ ๊ฐ๋ฅ
๐ ๋น ๋ฅธ ์ฌ์ฉ๋ฒ
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "FINAL-Bench/lastbrain"
tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True
)
messages = [
{"role": "user", "content": "If a train travels 60 km in 45 minutes, what is its speed in km/h?"}
]
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=800,
do_sample=False,
pad_token_id=tok.eos_token_id,
)
print(tok.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
์์ ์ถ๋ ฅ:
To find the speed of the train in km/h, we need to convert the given time from minutes to hours.
**Given:**
- Distance = 60 km
- Time = 45 minutes
**Step 1: Convert time to hours**
Since there are 60 minutes in 1 hour:
**Step 2: Calculate speed**
**Final Answer:** The speed of the train is **80 km/h**.
๐งฌ Darwin V8 ํ์ต ํ์ดํ๋ผ์ธ
[Qwen/Qwen3.5-2B] โโโโ Base ๋ชจ๋ธ (๋๊ฒฐ)
+
[4,451 Claude Opus/Sonnet reasoning traces]
โ
[SFT Training]
- LoRA (all-linear, r=16, ฮฑ=32)
- Learning rate: 2e-4 (V8 rule: ร10 FullFT)
- 2 epochs, bf16, 8รB200 DDP
- Loss: 1.33 โ 1.10 (-17%)
- Token accuracy: 68% โ 72% (+4%p)
โ
[LoRA merge into base weights]
โ
[lastbrain] โ ์ด ๋ชจ๋ธ
๐ ํ์ต ๋ฐ์ดํฐ ๊ตฌ์ฑ
| ๋ฐ์ดํฐ์ | ์ํ ์ | ์ถ์ฒ Teacher |
|---|---|---|
| nohurry/Opus-4.6-Reasoning-3000x-filtered | 2,326 | Claude Opus 4.6 |
| TeichAI/Claude-Opus-4.6-Reasoning-887x | 887 | Claude Opus 4.6 |
| TeichAI/claude-4.5-opus-high-reasoning-250x | 250 | Claude Opus 4.5 |
| TeichAI/Claude-Sonnet-4.6-Reasoning-1100x | 1,100 | Claude Sonnet 4.6 |
| ํฉ๊ณ (ํํฐ ํ) | 4,451 | - |
๐ฏ ์ค๊ณ ์ฒ ํ (Darwin V8)
- LoRA Without Regret โ
all-lineartarget, high LR, ์์ rank๋ OK - Response Distillation โ pre-generated Opus traces๋ก ๋น์ฉ ํจ์จ์ ์ฆ๋ฅ
- Merge-and-Deploy โ LoRA ์ด๋ํฐ ํตํฉ ํ ์ถ๊ฐ ์์กด์ฑ ์์ด ๋ฐฐํฌ
๐ ์ฌํ ๋ฐฉ๋ฒ
์ด ๋ชจ๋ธ์ ๋ค์ ๋ ์ปดํฌ๋ํธ๋ฅผ mergeํ์ฌ ๋ง๋ค์ด์ก์ต๋๋ค:
from transformers import AutoModelForCausalLM
from peft import PeftModel
import torch
base = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3.5-2B", torch_dtype=torch.bfloat16
)
model = PeftModel.from_pretrained(
base, "FINAL-Bench/Qwen3.5-2B-Opus-Distill-v1"
)
merged = model.merge_and_unload()
merged.save_pretrained("./lastbrain")
๐ ์ํ ํ ์คํธ ๊ฒฐ๊ณผ (4๋ฌธ์ )
| ์ ํ | ์ ๋ต ์ฌ๋ถ | ์๋ต ๊ธธ์ด |
|---|---|---|
| Math (๊ธฐ์ฐจ ์๋) | โ 80 km/h | 771์ |
| Logic (ํค ๋น๊ต) | โ Carol | 354์ |
| Code (์์ ํ๋ณ) | โ Python ํจ์ | 1,712์ |
| Korean (์ต์ ์๊ธ) | โ 1,577,600์ | 142์ |
Markdown/LaTeX/Step-by-Step ๊ตฌ์กฐํ๋ ๋ต๋ณ ์์ฐ์ค๋ฝ๊ฒ ์์ฑ
โ ๏ธ ์ ํ ์ฌํญ
- ๊ท๋ชจ: 2.3B ํ๋ผ๋ฏธํฐ (์ํ ๋ชจ๋ธ)
- ํ๊ตญ์ด ๊ณ์ฐ ์ ํ์ฑ: ๋๋ก ์ซ์ ์ค๋ฅ ๋ฐ์ ๊ฐ๋ฅ (์ํ ๋ชจ๋ธ ํ๊ณ)
- ๊ธด ์ปจํ ์คํธ: ํ์ต ์ max_length=4,096์ผ๋ก ํ์ต๋จ
<think>ํ๊ทธ: ๋ช ์์ ์ฌ์ฉ ๋ฎ์ (reasoning์ ๋ณธ๋ฌธ์ ํตํฉ)
๐ชช ๋ผ์ด์ ์ค
- Base model: Apache 2.0 (Qwen)
- ํ์ต ๋ฐ์ดํฐ: ๊ฐ ๋ฐ์ดํฐ์ ๊ฐ๋ณ ๋ผ์ด์ ์ค ์ฐธ์กฐ
- ์ด ๋ชจ๋ธ: Apache 2.0
๐ ํฌ๋ ๋ง
- Base: Qwen team (Alibaba)
- Teacher: Anthropic (Claude Opus 4.5/4.6, Sonnet 4.6)
- ๋ฐ์ดํฐ ๊ณต๊ฐ: nohurry, TeichAI
- Training & Release: FINAL-Bench / VIDRAFT_LAB
๐ ๊ด๋ จ ๋ชจ๋ธ
- ๐ง
FINAL-Bench/Qwen3.5-2B-Opus-Distill-v1โ ์ด ๋ชจ๋ธ์ LoRA ์ด๋ํฐ ๋จ๋ ๋ฒ์ - โก
FINAL-Bench/Qwen3.5-2B-Opus-SDPO-v1โ Phase 4 SDPO ์๊ธฐ์ฆ๋ฅ ๊ฐํ๋ณธ
Darwin V8 ยท Part of the evolutionary model merging series by VIDRAFT_LAB