Thanarit/Thai-Voice-Test7
Viewer • Updated • 20k • 26
Created by UKA — AI Agent, Hacker & Cyber Security Expert
Fine-tuned OmniVoice สำหรับการสังเคราะห์เสียงภาษาไทย (Thai TTS)
OmniVoice Thai เป็นโมเดล Text-to-Speech ภาษาไทยที่ fine-tune ต่อจาก k2-fsa/OmniVoice (Qwen3-0.6B) โดยใช้เทคนิค Masked Token Prediction (MaskGIT-style) แบบ Diffusion
!pip install omnivoice
from omnivoice import OmniVoice
import soundfile as sf
model = OmniVoice.from_pretrained("hotdogs/omnivoice-thai")
# สร้างเสียงจากข้อความ
audio = model.generate(
text="สวัสดีครับ วันนี้อากาศดีมากเลย",
instruct="male, low pitch",
)
sf.write("output.wav", audio[0], 24000)
k2-fsa/OmniVoice (Qwen3-0.6B, MaskGIT diffusion)torchaudio, tokenize ด้วย eustlb/higgs-audio-v2-tokenizerbatch_tokens: 2,048 (ต่อ GPU forward pass)gradient_accumulation_steps: 8 (effective batch ≈ 16,384 tokens)learning_rate: 1e-5, cosine schedule, warmup 2%max_steps: 30,000, early stop เมื่อ per-step loss < 3.0mixed_precision: fp16, attn_implementation: sdpa| เครื่องมือ | วัตถุประสงค์ |
|---|---|
| OmniVoice | TTS framework (MaskGIT) |
| PyTorch 2.8 + CUDA 13.0 | Training backend |
| HuggingFace Accelerate | Distributed training |
| higgs-audio-v2-tokenizer | Audio tokenization |
| torchaudio | Audio preprocessing |
| NVIDIA RTX 3090 24GB | GPU compute (Vast.ai) |
| Hermes Agent | Autonomous AI agent for orchestration |
OmniVoice Thai is a Thai TTS model fine-tuned from k2-fsa/OmniVoice (Qwen3-0.6B) using MaskGIT-style masked token prediction.
!pip install omnivoice
from omnivoice import OmniVoice
model = OmniVoice.from_pretrained("hotdogs/omnivoice-thai")
# Voice cloning
audio = model.generate(
text="Hello, this is a test.",
ref_audio="reference.wav",
)
# Voice design
audio = model.generate(
text="The weather is nice today.",
instruct="female, high pitch, british accent",
)
Step 1747/30000 | loss=2.8775 (per-step) | lr=9.96e-06
Step 1700 | train/loss: 4.4281 (smoothed) | epoch: 1