khawajaaliarshad/common-voice-urdu-processed-expanded
Viewer • Updated • 64.9k • 267 • 1
How to use mahwizzzz/whispLM-600m with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("automatic-speech-recognition", model="mahwizzzz/whispLM-600m") # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("mahwizzzz/whispLM-600m", dtype="auto")whispLM-600m is a Speech-LLM for low-resource Urdu automatic speech recognition (ASR). It couples a Whisper-tiny acoustic encoder with a Qwen3-0.6B language model decoder via a learned compression projector, trained using a two-stage curriculum.
Evaluated on 1,000 test samples. Normalization: diacritics stripped, NFKC Unicode normalization applied.
| Model | WER ↓ | CER ↓ | RTF ↓ |
|---|---|---|---|
| whispLM-600m | 0.5566 | 0.2805 | 0.1953 |
| voxbridge-37m (baseline) | 0.6076 | 0.2455 | 0.0268 |
whispLM-600m achieves 5.11% relative WER reduction over the fine-tuned Whisper-tiny baseline.
import torch, torchaudio
from transformers import WhisperModel, WhisperProcessor, AutoModelForCausalLM, AutoTokenizer
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.bfloat16
processor = WhisperProcessor.from_pretrained("mahwizzzz/whispLM-600m", subfolder="encoder")
encoder = WhisperModel.from_pretrained(
"mahwizzzz/whispLM-600m", subfolder="encoder", torch_dtype=dtype
).encoder.to(device).eval()
tokenizer = AutoTokenizer.from_pretrained("mahwizzzz/whispLM-600m", subfolder="llm")
llm = AutoModelForCausalLM.from_pretrained(
"mahwizzzz/whispLM-600m", subfolder="llm", torch_dtype=dtype
).to(device).eval()
projector_state = torch.load(
hf_hub_download("mahwizzzz/whispLM-600m", "projector.pt"), map_location=device
)
audio, sr = torchaudio.load("audio.wav")
if sr != 16000:
audio = torchaudio.functional.resample(audio, sr, 16000)
feats = processor.feature_extractor(
audio.squeeze().numpy(), sampling_rate=16000, return_tensors="pt"
).input_features.to(dtype).to(device)
Note:
whispLM-600mis a custom Speech-LLM with separateencoder/,llm/, andprojector.ptcomponents. It is not compatible with the defaulttransformers.pipeline("automatic-speech-recognition")API.
@misc{mahwiz2026whisplm,
title={whispLM: Two-Stage Speech-LLM for Low-Resource Urdu ASR},
author={Mahwiz Khalil},
year={2026},
publisher={HuggingFace},
url={https://huggingface.co/mahwizzzz/whispLM-600m}
}