Use XLM-R to build LM on X-languages (X>=5) from scratch

by bonadossou - opened May 29, 2022

May 29, 2022

How to the XLM-R of HF train on our own languages, from scratch? The documentation is not super clear about it. I was working mainly with this (https://github.com/facebookresearch/XLM) but it is complex enough for my purpose.

patrickvonplaten

May 30, 2022

Hey @bonadossue,

In general, I would not recommend to train XLM-R from scratch as it has been pretrained on all kinds of languages and one should be able to just fine-tune it on your preferred language. If you really want to run a whole pretraining though, I'd recommend the following example: https://github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_mlm.py

Which languages are you mostly interested in?

bonadossou

May 30, 2022

Many languages like Fon, Ghomala, Bambara, etc

patrickvonplaten

May 30, 2022

Ok did you try just fine-tuning XLM-R on those languages?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment