TRANSLTR โ English โ Tiv Translation Model
Fine-tuned for Tiv, a Benue-Congo language spoken by ~4 million people in Benue State, Nigeria.
Built by Victor Achede under Black Sheep Co.
Model Summary
| Property | Detail |
|---|---|
| Base model | Helsinki-NLP/opus-mt-en-mul |
| Task | Machine translation (EN โ TIV) |
| Language pair | English โ Tiv (tiv) |
| Architecture | MarianMT (transformer seq2seq) |
| Training data | Custom curated ENโTIV parallel corpus (Bible-domain, conversational) |
| Fine-tuning epochs | 10 |
| Batch size | 32 |
| Hardware | NVIDIA T4 (Google Colab) |
| Framework | HuggingFace Transformers 4.x |
Why This Exists
Tiv is one of Nigeria's major languages โ spoken by millions across Benue State and the diaspora โ yet it has zero representation in any major NLP benchmark, translation API, or pretrained multilingual model.
Google Translate doesn't support it. DeepL doesn't support it. NLLB-200 doesn't support it.
This model is the first step toward changing that. It is part of TRANSLTR, a real-time spoken language translation system being built to bridge Tiv speakers into the digital world โ starting with live event translation at the GCK Benue conference, July 2025, IBB Square, Makurdi.
Usage
from transformers import MarianMTModel, MarianTokenizer
model_id = "victorachede/tiv-translator"
tokenizer = MarianTokenizer.from_pretrained(model_id)
model = MarianMTModel.from_pretrained(model_id)
def translate(text):
inputs = tokenizer(text, return_tensors="pt", padding=True)
out = model.generate(**inputs, max_length=128, num_beams=4)
return tokenizer.decode(out[0], skip_special_tokens=True)
print(translate("For God so loved the world."))
print(translate("The Lord is my shepherd."))
print(translate("Ask and it shall be given unto you."))
Training Data
The dataset is a custom-built parallel corpus of EnglishโTiv sentence pairs, assembled specifically for this project. Sources include:
- Bible text (primary domain) โ Luke, John, Acts (priority books)
- Conversational Tiv โ everyday phrases and common expressions
- Manual curation by native Tiv speakers
Dataset size at time of training: 28,987 verified pairs
Dataset is actively growing. Model will be retrained as corpus expands.
Limitations
- Low-resource reality: 233 pairs is a starting point, not a ceiling. Outputs improve meaningfully with each dataset expansion.
- Domain: currently strongest on Bible-register English. Colloquial or technical text may produce weaker results.
- Tokenizer mismatch: the base MarianMT tokenizer was not trained on Tiv โ subword segmentation of Tiv tokens is imperfect at this stage. A Tiv-native tokenizer is on the roadmap.
- This is v0.1. It is not production-ready. It is a proof-of-concept that this problem is solvable.
Roadmap
- Fine-tune MarianMT base on ENโTIV (28,987 pairs)
- Expand dataset to 100,000+ pairs
- Train Tiv-native SentencePiece tokenizer
- ElevenLabs Tiv voice clone integration (TTS output)
- Groq Whisper STT pipeline (speech input)
- Live demo at GCK Benue, June 2025
- Mobile SDK for offline Tiv translation
Citation
If you use this model or dataset in your research, please cite:
@misc{achede2025tivtranslator,
author = {Victor Achede},
title = {TRANSLTR: English-Tiv Neural Machine Translation},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/victorachede/tiv-translator}
}
About
Built by Victor Achede, founder of Black Sheep Co. โ a technology holding company based in Makurdi, Benue State, Nigeria.
TRANSLTR is one of several products under active development targeting African language infrastructure, live event technology, and low-resource NLP.
"If it doesn't exist, build it."
- Downloads last month
- 245
Model tree for victorachede/tiv-translator
Base model
Helsinki-NLP/opus-mt-en-mul