ZONOS2

ZONOS2 is Zyphra's autoregressive text-to-speech model with multi-codebook audio generation, 44.1 kHz DAC decode, and speaker conditioning from reference audio.

Original model: Zyphra/ZONOS2

Supported Repositories

Repository Format Notes
mlx-community/Zyphra-ZONOS2 BF16 Official MLX conversion for mlx-audio

Installation

pip install mlx-audio

Usage

Python API:

from mlx_audio.audio_io import write as audio_write
from mlx_audio.tts import load

model = load("mlx-community/Zyphra-ZONOS2", lazy=True)

result = next(model.generate(
    text="Hello, this is ZONOS two running locally with MLX audio.",
    max_tokens=220,
))

audio_write("zonos2.wav", result.audio, result.sample_rate)

Voice Cloning

Pass a short reference clip with ref_audio. Clean speech-only clips work best.

result = next(model.generate(
    text="This text will be spoken with the reference speaker.",
    ref_audio="speaker.wav",
    max_tokens=220,
))

You can also compute the speaker embedding once and reuse it from the Python API:

speaker = model.extract_speaker_embedding("speaker.wav")

result = next(model.generate(
    text="This reuses a precomputed speaker embedding.",
    speaker_embedding=speaker,
    max_tokens=220,
))

CLI

python -m mlx_audio.tts.generate \
  --model mlx-community/Zyphra-ZONOS2 \
  --text "Hello, this is ZONOS two running with MLX audio." \
  --output_path outputs \
  --file_prefix zonos2

Voice cloning:

python -m mlx_audio.tts.generate \
  --model mlx-community/Zyphra-ZONOS2 \
  --text "This text will use the voice from the reference clip." \
  --ref_audio speaker.wav \
  --output_path outputs \
  --file_prefix zonos2_clone

Generation Parameters

Parameter Default Description
ref_audio None Reference audio path or array for voice cloning
speaker_embedding None Precomputed 2048-D speaker embedding, Python API only
max_tokens 1024 Maximum number of audio token frames
temperature 1.15 Sampling temperature
top_k 106 Top-k sampling filter
top_p 0.0 Nucleus sampling filter, disabled at 0
min_p 0.18 Minimum-probability sampling filter
repetition_penalty 1.2 Repetition penalty applied to recent audio tokens
seed None Seed for deterministic sampling
text_normalization True English text normalization toggle, Python API only

Notes

License

See the Zyphra/ZONOS2 model card for upstream license and usage details.

Downloads last month
122
Safetensors
Model size
8B params
Tensor type
BF16
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mlx-community/Zyphra-ZONOS2

Base model

Zyphra/ZONOS2
Finetuned
(1)
this model