Instructions to use urduhack/roberta-urdu-small with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use urduhack/roberta-urdu-small with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="urduhack/roberta-urdu-small")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("urduhack/roberta-urdu-small") model = AutoModelForMaskedLM.from_pretrained("urduhack/roberta-urdu-small") - Notebooks
- Google Colab
- Kaggle
GuideLines for training urdu tokenizer for much larger corpus
#1
by hadidev - opened
HI, I'm trying to train tokenizer as well as BERT model for urdu data. Can you share your step by step used in this model
π Installation
Urduhack officially supports Python 3.6β3.7, and runs great on PyPy.
Installing with tensorflow cpu version.
$ pip install urduhack[tf]
Installing with tensorflow gpu version.
$ pip install urduhack[tf-gpu]
Usage
import urduhack
# Downloading models
urduhack.download()
nlp = urduhack.Pipeline()
text = ""
doc = nlp(text)
for sentence in doc.sentences:
print(sentence.text)
for word in sentence.words:
print(f"{word.text}\t{word.pos}")
for token in sentence.tokens:
print(f"{token.text}\t{token.ner}")