Instructions to use Kennethdot/kasanoma_whisper with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Kennethdot/kasanoma_whisper with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="Kennethdot/kasanoma_whisper")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("Kennethdot/kasanoma_whisper") model = AutoModelForSpeechSeq2Seq.from_pretrained("Kennethdot/kasanoma_whisper") - Notebooks
- Google Colab
- Kaggle
English–Twi Code-Switching ASR Model - Kasanoma
Model Overview
This repository contains a checkpoint of openai/whisper-small for English–Twi code-switching speech transcription. It is built on a pretrained Akan-adapted Whisper model (Giftmark/akan-whisper-model) and further fine-tuned on a realistic bilingual dataset containing English & Twi mixed-language utterances.
The model supports natural bilingual speech, including intra-sentential and inter-sentential code-switching.
How to use the model
import torch
from datasets import load_dataset, Audio
from transformers import WhisperForConditionalGeneration, WhisperProcessor
model = WhisperForConditionalGeneration.from_pretrained("Kennethdot/kasanoma_whisper")
processor = WhisperProcessor.from_pretrained("Kennethdot/kasanoma_whisper")
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
model.eval()
# Sample dataset you could easily use
dataset = load_dataset("Kennethdot/Ghana_English-Twi_Code-switching_ASR", split="test").cast_column("audio", Audio(sampling_rate=16000))
sample = dataset[0]["audio"]
input_features = processor.feature_extractor(sample["array"], sampling_rate=sample["sampling_rate"], return_tensors="pt")
input_features = input_features.to(model.device)
with torch.no_grad():
predicted_ids = model.generate(input_features, max_new_tokens=128, task="transcribe")
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0].strip()
print(transcription)
Model details
- Task: English-Twi mixed speech transcription
- Base model:
GiftMark/akan-whisper-model - The model was trained on
Kennethdot/Ghana_English-Twi_Code-switching_ASRwhich contains 100 hours of Eng-twi code-switched speech.
Evaluation Results
| Model | CS WER | Twi WER | English WER |
|---|---|---|---|
| Zero-shot Akan Whisper Small | 90.39 | 85.08 | 110.26 |
| Fine-tuned Model | 6.58 | 99.44 | 100.43 |
Examples
The model is capable of producing fluent bilingual outputs with preserved punctuation and natural speech patterns:
Example 1: Ma yɛnkɔgye yɛn ani, it has been a long week.
Example 2: Adwuma no yɛ den dodo, I need a vacation.
Example 3: Nsuomnam yɛ dɛ paa, w’atry-i grilled tilapia?
Limitations
It transcribes english-twi mixed speech perfectly. However, Whisper natively transcribes speech samples in chunks of 30s, so for longer samples apply chunking. It may also struggle with slang, idioms, informal Twi, spelling variation, names, or sentences far outside the training distribution. Human review is recommended for high-value use.
Ethical Considerations
- The model is intended for research and educational use only
- It should not be used for surveillance or unauthorized speech monitoring
- Bias may exist due to dataset imbalance between languages
- Downloads last month
- 139