whisper-tiny-it-multi-ggml
whisper.cpp GGML quantizations of LocalAI-io/whisper-tiny-it-multi for fast CPU/GPU inference.
Author: Ettore Di Giacinto
Brought to you by the LocalAI team. These models can be used directly with LocalAI and any whisper.cpp-based runtime.
Files
| File | Quantization | Description |
|---|---|---|
ggml-model-f16.bin |
float16 | Full precision (no quantization) — highest quality |
ggml-model-q8_0.bin |
int8 | 8-bit quantization — minimal quality loss |
ggml-model-q5_0.bin |
int5 | 5-bit quantization — good quality/size tradeoff |
ggml-model-q4_0.bin |
int4 | 4-bit quantization — smallest size, fastest |
Training
Fine-tuned openai/whisper-tiny (39M params) on Common Voice 25.0 + MLS + VoxPopuli + FLEURS Italian.
See LocalAI-io/whisper-tiny-it-multi for the full safetensors model and detailed WER results.
Usage
whisper.cpp
# Download a quant
huggingface-cli download LocalAI-io/whisper-tiny-it-multi-ggml ggml-model-q5_0.bin --local-dir .
# Run
./whisper-cli -m ggml-model-q5_0.bin -f audio.wav -l it
whisper.cpp Python bindings (pywhispercpp)
from pywhispercpp.model import Model
model = Model("ggml-model-q5_0.bin", language="it")
segments = model.transcribe("audio.wav")
for seg in segments:
print(seg.text)
LocalAI
# In your LocalAI model config
name: whisper-tiny-it-multi
backend: whisper
parameters:
model: ggml-model-q5_0.bin
Links
- HF Safetensors: LocalAI-io/whisper-tiny-it-multi
- CTranslate2 INT8: LocalAI-io/whisper-tiny-it-multi-ct2-int8
- Code: github.com/localai-org/whisper-it
- whisper.cpp: github.com/ggerganov/whisper.cpp
- LocalAI: github.com/mudler/LocalAI
Model tree for LocalAI-io/whisper-tiny-it-multi-ggml
Base model
openai/whisper-tiny