google/fleurs
Viewer • Updated • 768k • 57.2k • 402
whisper.cpp GGML quantizations of LocalAI-io/whisper-medium-it-multi for fast CPU/GPU inference.
Author: Ettore Di Giacinto
Brought to you by the LocalAI team. These models can be used directly with LocalAI and any whisper.cpp-based runtime.
| File | Quantization | Description |
|---|---|---|
ggml-model-f16.bin |
float16 | Full precision (no quantization) — highest quality |
ggml-model-q8_0.bin |
int8 | 8-bit quantization — minimal quality loss |
ggml-model-q5_0.bin |
int5 | 5-bit quantization — good quality/size tradeoff |
ggml-model-q4_0.bin |
int4 | 4-bit quantization — smallest size, fastest |
Fine-tuned openai/whisper-medium (769M params) on Common Voice 25.0 + MLS + VoxPopuli + FLEURS Italian.
See LocalAI-io/whisper-medium-it-multi for the full safetensors model and detailed WER results.
# Download a quant
huggingface-cli download LocalAI-io/whisper-medium-it-multi-ggml ggml-model-q5_0.bin --local-dir .
# Run
./whisper-cli -m ggml-model-q5_0.bin -f audio.wav -l it
from pywhispercpp.model import Model
model = Model("ggml-model-q5_0.bin", language="it")
segments = model.transcribe("audio.wav")
for seg in segments:
print(seg.text)
# In your LocalAI model config
name: whisper-medium-it-multi
backend: whisper
parameters:
model: ggml-model-q5_0.bin
Base model
openai/whisper-medium