Automatic Speech Recognition
Transformers
PyTorch
English
Twi
whisper

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

English–Twi Code-Switching ASR Model - Kasanoma

Model Overview

This repository contains a checkpoint of openai/whisper-small for English–Twi code-switching speech transcription. It is built on a pretrained Akan-adapted Whisper model (Giftmark/akan-whisper-model) and further fine-tuned on a realistic bilingual dataset containing English & Twi mixed-language utterances.

The model supports natural bilingual speech, including intra-sentential and inter-sentential code-switching.

How to use the model

import torch
from datasets import load_dataset, Audio
from transformers import WhisperForConditionalGeneration, WhisperProcessor

model = WhisperForConditionalGeneration.from_pretrained("Kennethdot/kasanoma_whisper")
processor = WhisperProcessor.from_pretrained("Kennethdot/kasanoma_whisper")

device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
model.eval()

# Sample dataset you could easily use
dataset = load_dataset("Kennethdot/Ghana_English-Twi_Code-switching_ASR", split="test").cast_column("audio", Audio(sampling_rate=16000))
sample = dataset[0]["audio"]

input_features = processor.feature_extractor(sample["array"], sampling_rate=sample["sampling_rate"], return_tensors="pt")
input_features = input_features.to(model.device)

with torch.no_grad():
    predicted_ids = model.generate(input_features, max_new_tokens=128, task="transcribe")

transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0].strip()
print(transcription)

Model details

  • Task: English-Twi mixed speech transcription
  • Base model: GiftMark/akan-whisper-model
  • The model was trained on Kennethdot/Ghana_English-Twi_Code-switching_ASR which contains 100 hours of Eng-twi code-switched speech.

Evaluation Results

Model CS WER Twi WER English WER
Zero-shot Akan Whisper Small 90.39 85.08 110.26
Fine-tuned Model 6.58 99.44 100.43

Examples

The model is capable of producing fluent bilingual outputs with preserved punctuation and natural speech patterns:

  • Example 1: Ma yɛnkɔgye yɛn ani, it has been a long week.

  • Example 2: Adwuma no yɛ den dodo, I need a vacation.

  • Example 3: Nsuomnam yɛ dɛ paa, w’atry-i grilled tilapia?

Limitations

It transcribes english-twi mixed speech perfectly. However, Whisper natively transcribes speech samples in chunks of 30s, so for longer samples apply chunking. It may also struggle with slang, idioms, informal Twi, spelling variation, names, or sentences far outside the training distribution. Human review is recommended for high-value use.

Ethical Considerations

  • The model is intended for research and educational use only
  • It should not be used for surveillance or unauthorized speech monitoring
  • Bias may exist due to dataset imbalance between languages
Downloads last month
139
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Kennethdot/kasanoma_whisper

Finetuned
(2)
this model

Dataset used to train Kennethdot/kasanoma_whisper

Space using Kennethdot/kasanoma_whisper 1