btsee/mbspeech_mn
Viewer • Updated • 3.85k • 74 • 3
How to use btsee/oron-tts with F5-TTS:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
Non-autoregressive text-to-speech model based on F5-TTS (Flow Matching + Diffusion Transformer) for Mongolian (Khalkha Cyrillic) and Kazakh (Cyrillic).
| Parameter | Value |
|---|---|
| Architecture | F5-TTS (OT-CFM + DiT + Vocos) |
| dim | 1024 |
| depth | 22 |
| heads | 16 |
| vocab_size | 65 |
| sample_rate | 24000 Hz |
| mel_bins | 100 |
from src.models.f5tts import F5TTS
from src.utils.checkpoint import CheckpointManager
model = F5TTS.from_config(config)
cm = CheckpointManager("checkpoints")
cm.load(model, path="f5tts_best.pt", device="cuda")
wav = model.synthesize(
text="Сайн байна уу",
lang="mn",
ref_audio_path="ref.wav",
)
Trained on btsee/mbspeech_mn (3,846 Mongolian speech samples).
MIT