| --- |
| license: mit |
| library_name: vibevoice.cpp |
| tags: |
| - tts |
| - asr |
| - speech |
| - vibevoice |
| - gguf |
| - ggml |
| base_model: |
| - microsoft/VibeVoice-Realtime-0.5B |
| - microsoft/VibeVoice-ASR |
| --- |
| |
| # vibevoice.cpp β quantized model bundle |
|
|
| **Brought to you by the [LocalAI](https://github.com/mudler/LocalAI) team** β the creators of LocalAI, the open-source AI engine that runs any model β LLMs, vision, voice, image, video β on any hardware. No GPU required. |
|
|
| Quantized GGUF weights for [vibevoice.cpp](https://github.com/mudler/vibevoice.cpp), |
| a C/C++ port of Microsoft VibeVoice (TTS + ASR) on top of `ggml`. |
|
|
| | File | Source | Quant | Size | |
| | ---- | ------ | ----- | ---- | |
| | `vibevoice-realtime-0.5B-q8_0.gguf` | `microsoft/VibeVoice-Realtime-0.5B` | Q8_0 (matmul) + F16 | ~1.6 GB | |
| | `vibevoice-asr-q8_0.gguf` | `microsoft/VibeVoice-ASR` | Q8_0 (matmul) + F16 | ~13 GB | |
| | `voice-en-Carter_man.gguf` | upstream voice prompt cache | F16 | 8 MB | |
| | `voice-en-Emma.gguf` | upstream voice prompt cache | F16 | 6 MB | |
| | `tokenizer.gguf` | Qwen2.5 BPE + VibeVoice specials | β | 6 MB | |
|
|
| ## Quantization scheme |
|
|
| `scripts/quantize_gguf.py` in the source repo selectively quantizes only the |
| LM matmul weights β attention q/k/v/o, ffn gate/up/down, and lm_head β to |
| Q8_0. Everything else (1-D conv kernels, RMSNorm scales, biases, |
| layer-scale gammas, token embeddings, small scalars) passes through |
| unchanged. The conv1d implementation in vibevoice.cpp casts kernels to F16 |
| inline rather than dequantizing on the fly, so quantizing those would |
| corrupt the convolution outputs. |
|
|
| Q8_0 was chosen because it's pure-Python implementable in `gguf-py` and |
| gives a ~60% size reduction on the 7B ASR model with no measurable |
| quality regression in the closed-loop TTS β ASR roundtrip test. |
| |
| ## Quickstart |
| |
| ```bash |
| git clone --recursive https://github.com/mudler/vibevoice.cpp |
| cd vibevoice.cpp && cmake -B build -DVIBEVOICE_BUILD_TESTS=ON && cmake --build build -j |
| |
| # Pull this bundle |
| mkdir -p models && cd models |
| hf download mudler/vibevoice.cpp-models --local-dir . |
| cd .. |
| |
| # TTS |
| build/bin/vibevoice-cli tts \ |
| --model models/vibevoice-realtime-0.5B-q8_0.gguf \ |
| --voice models/voice-en-Carter_man.gguf \ |
| --tokenizer models/tokenizer.gguf \ |
| --text "Hello world this is a test of the synthesis system." \ |
| --out hello.wav |
| |
| # ASR |
| build/bin/vibevoice-cli asr \ |
| --model models/vibevoice-asr-q8_0.gguf \ |
| --tokenizer models/tokenizer.gguf \ |
| --audio hello.wav |
| # -> [{"Start":0,"End":2.8,"Speaker":0,"Content":"Hello world, this is a test of the synthesis system."}] |
| ``` |
| |
| ## Closed-loop verification |
|
|
| The `test_closed_loop` ctest in vibevoice.cpp runs TTS β ASR end-to-end |
| and asserts β₯80% source-word recall in the recovered transcript. With |
| this bundle (both Q8_0 models) it passes at 10/10 (100 %). |
| |
| ## License |
| |
| Weights are derived from Microsoft VibeVoice |
| ([VibeVoice-Realtime-0.5B](https://huggingface.co/microsoft/VibeVoice-Realtime-0.5B) |
| and [VibeVoice-ASR](https://huggingface.co/microsoft/VibeVoice-ASR)); |
| follow the upstream model licenses for use. The conversion + quantization |
| tooling is released under MIT as part of vibevoice.cpp. |
| |