You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

English–Twi Code-Switching ASR Model - Kasanoma

Model Overview

This repository contains a checkpoint of openai/whisper-small for English–Twi code-switching speech transcription. It is built on a pretrained Akan-adapted Whisper model (Giftmark/akan-whisper-model) and further fine-tuned on a realistic bilingual dataset containing English & Twi mixed-language utterances.

The model supports natural bilingual speech, including intra-sentential and inter-sentential code-switching.

How to use the model

import torch
from datasets import load_dataset, Audio
from transformers import WhisperForConditionalGeneration, WhisperProcessor

model = WhisperForConditionalGeneration.from_pretrained("Kennethdot/kasanoma_whisper")
processor = WhisperProcessor.from_pretrained("Kennethdot/kasanoma_whisper")

device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
model.eval()

# Sample dataset you could easily use
dataset = load_dataset("Kennethdot/Ghana_English-Twi_Code-switching_ASR", split="test").cast_column("audio", Audio(sampling_rate=16000))
sample = dataset[0]["audio"]

input_features = processor.feature_extractor(sample["array"], sampling_rate=sample["sampling_rate"], return_tensors="pt")
input_features = input_features.to(model.device)

with torch.no_grad():
    predicted_ids = model.generate(input_features, max_new_tokens=128, task="transcribe")

transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0].strip()
print(transcription)

Model details

Task: English-Twi mixed speech transcription
Base model: GiftMark/akan-whisper-model
The model was trained on Kennethdot/Ghana_English-Twi_Code-switching_ASR which contains 100 hours of Eng-twi code-switched speech.

Evaluation Results

Model	CS WER	Twi WER	English WER
Zero-shot Akan Whisper Small	90.39	85.08	110.26
Fine-tuned Model	6.58	99.44	100.43

Examples

The model is capable of producing fluent bilingual outputs with preserved punctuation and natural speech patterns:

Example 1: Ma yɛnkɔgye yɛn ani, it has been a long week.
Example 2: Adwuma no yɛ den dodo, I need a vacation.
Example 3: Nsuomnam yɛ dɛ paa, w’atry-i grilled tilapia?

Limitations

It transcribes english-twi mixed speech perfectly. However, Whisper natively transcribes speech samples in chunks of 30s, so for longer samples apply chunking. It may also struggle with slang, idioms, informal Twi, spelling variation, names, or sentences far outside the training distribution. Human review is recommended for high-value use.

Ethical Considerations

The model is intended for research and educational use only
It should not be used for surveillance or unauthorized speech monitoring
Bias may exist due to dataset imbalance between languages

Downloads last month: 139

Model tree for Kennethdot/kasanoma_whisper

Base model

openai/whisper-small

Finetuned

GiftMark/akan-whisper-model