ParsBERT: Transformer-based Model for Persian Language Understanding
Paper • 2005.12515 • Published
How to use HooshvareLab/bert-fa-zwnj-base with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("fill-mask", model="HooshvareLab/bert-fa-zwnj-base") # Load model directly
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("HooshvareLab/bert-fa-zwnj-base")
model = AutoModelForMaskedLM.from_pretrained("HooshvareLab/bert-fa-zwnj-base")A Transformer-based Model for Persian Language Understanding
The new version of BERT v3.0 for Persian is available today and can tackle the zero-width non-joiner character for Persian writing. Also, the model was trained on new multi-types corpora with a new set of vocabulary.
ParsBERT is a monolingual language model based on Google’s BERT architecture. This model is pre-trained on large Persian corpora with various writing styles from numerous subjects (e.g., scientific, novels, news).
Paper presenting ParsBERT: arXiv:2005.12515
Please cite in publications as the following:
@article{ParsBERT,
title={ParsBERT: Transformer-based Model for Persian Language Understanding},
author={Mehrdad Farahani, Mohammad Gharachorloo, Marzieh Farahani, Mohammad Manthouri},
journal={ArXiv},
year={2020},
volume={abs/2005.12515}
}
Post a Github issue on the ParsBERT Issues repo.