TRANSLTR โ€” English โ†’ Tiv Translation Model

Fine-tuned for Tiv, a Benue-Congo language spoken by ~4 million people in Benue State, Nigeria.
Built by Victor Achede under Black Sheep Co.


Model Summary

Property Detail
Base model Helsinki-NLP/opus-mt-en-mul
Task Machine translation (EN โ†’ TIV)
Language pair English โ†’ Tiv (tiv)
Architecture MarianMT (transformer seq2seq)
Training data Custom curated ENโ†”TIV parallel corpus (Bible-domain, conversational)
Fine-tuning epochs 10
Batch size 32
Hardware NVIDIA T4 (Google Colab)
Framework HuggingFace Transformers 4.x

Why This Exists

Tiv is one of Nigeria's major languages โ€” spoken by millions across Benue State and the diaspora โ€” yet it has zero representation in any major NLP benchmark, translation API, or pretrained multilingual model.

Google Translate doesn't support it. DeepL doesn't support it. NLLB-200 doesn't support it.

This model is the first step toward changing that. It is part of TRANSLTR, a real-time spoken language translation system being built to bridge Tiv speakers into the digital world โ€” starting with live event translation at the GCK Benue conference, July 2025, IBB Square, Makurdi.


Usage

from transformers import MarianMTModel, MarianTokenizer

model_id  = "victorachede/tiv-translator"
tokenizer = MarianTokenizer.from_pretrained(model_id)
model     = MarianMTModel.from_pretrained(model_id)

def translate(text):
    inputs = tokenizer(text, return_tensors="pt", padding=True)
    out    = model.generate(**inputs, max_length=128, num_beams=4)
    return tokenizer.decode(out[0], skip_special_tokens=True)

print(translate("For God so loved the world."))
print(translate("The Lord is my shepherd."))
print(translate("Ask and it shall be given unto you."))

Training Data

The dataset is a custom-built parallel corpus of Englishโ€“Tiv sentence pairs, assembled specifically for this project. Sources include:

  • Bible text (primary domain) โ€” Luke, John, Acts (priority books)
  • Conversational Tiv โ€” everyday phrases and common expressions
  • Manual curation by native Tiv speakers

Dataset size at time of training: 28,987 verified pairs
Dataset is actively growing. Model will be retrained as corpus expands.


Limitations

  • Low-resource reality: 233 pairs is a starting point, not a ceiling. Outputs improve meaningfully with each dataset expansion.
  • Domain: currently strongest on Bible-register English. Colloquial or technical text may produce weaker results.
  • Tokenizer mismatch: the base MarianMT tokenizer was not trained on Tiv โ€” subword segmentation of Tiv tokens is imperfect at this stage. A Tiv-native tokenizer is on the roadmap.
  • This is v0.1. It is not production-ready. It is a proof-of-concept that this problem is solvable.

Roadmap

  • Fine-tune MarianMT base on ENโ†’TIV (28,987 pairs)
  • Expand dataset to 100,000+ pairs
  • Train Tiv-native SentencePiece tokenizer
  • ElevenLabs Tiv voice clone integration (TTS output)
  • Groq Whisper STT pipeline (speech input)
  • Live demo at GCK Benue, June 2025
  • Mobile SDK for offline Tiv translation

Citation

If you use this model or dataset in your research, please cite:

@misc{achede2025tivtranslator,
  author    = {Victor Achede},
  title     = {TRANSLTR: English-Tiv Neural Machine Translation},
  year      = {2025},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/victorachede/tiv-translator}
}

About

Built by Victor Achede, founder of Black Sheep Co. โ€” a technology holding company based in Makurdi, Benue State, Nigeria.
TRANSLTR is one of several products under active development targeting African language infrastructure, live event technology, and low-resource NLP.

"If it doesn't exist, build it."

Downloads last month
245
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for victorachede/tiv-translator

Finetuned
(16)
this model