whisper-medium-it-multi-ggml

whisper.cpp GGML quantizations of LocalAI-io/whisper-medium-it-multi for fast CPU/GPU inference.

Author: Ettore Di Giacinto

Brought to you by the LocalAI team. These models can be used directly with LocalAI and any whisper.cpp-based runtime.

Files

File	Quantization	Description
`ggml-model-f16.bin`	float16	Full precision (no quantization) — highest quality
`ggml-model-q8_0.bin`	int8	8-bit quantization — minimal quality loss
`ggml-model-q5_0.bin`	int5	5-bit quantization — good quality/size tradeoff
`ggml-model-q4_0.bin`	int4	4-bit quantization — smallest size, fastest

Training

Fine-tuned openai/whisper-medium (769M params) on Common Voice 25.0 + MLS + VoxPopuli + FLEURS Italian.

See LocalAI-io/whisper-medium-it-multi for the full safetensors model and detailed WER results.

Usage

whisper.cpp

# Download a quant
huggingface-cli download LocalAI-io/whisper-medium-it-multi-ggml ggml-model-q5_0.bin --local-dir .

# Run
./whisper-cli -m ggml-model-q5_0.bin -f audio.wav -l it

whisper.cpp Python bindings (pywhispercpp)

from pywhispercpp.model import Model

model = Model("ggml-model-q5_0.bin", language="it")
segments = model.transcribe("audio.wav")
for seg in segments:
    print(seg.text)

LocalAI

# In your LocalAI model config
name: whisper-medium-it-multi
backend: whisper
parameters:
  model: ggml-model-q5_0.bin

Model tree for LocalAI-io/whisper-medium-it-multi-ggml

Base model

openai/whisper-medium

Finetuned

(878)

this model

LocalAI-io
/

whisper-medium-it-multi-ggml