Khayyam — Persian Multi-Label Emotion Classifier

Khayyam is a fine-tuned Persian BERT model for fine-grained, multi-label emotion classification across 28 emotion categories, based on the GoEmotions taxonomy. It is built on top of ParsBERT v3 and trained on GoEmotions Persian — a Persian translation of Google's GoEmotions benchmark dataset.

The model is named after Omar Khayyam, the 11th-century Persian poet, mathematician, and philosopher.


Model Details

Property Value
Base model HooshvareLab/bert-fa-zwnj-base (ParsBERT v3)
Task Multi-label text classification
Language Persian (Farsi / fa)
Number of labels 28
Decision threshold 0.3
License Apache 2.0

Evaluation Results

Evaluated on the held-out test split of GoEmotions Persian (5,427 samples):

Metric Score
F1 Macro 0.4124
F1 Micro 0.5563

For reference, the original GoEmotions paper (Demszky et al., 2020) reports F1 Macro of 0.46 for BERT-base on the English dataset. Khayyam achieves competitive results considering the additional challenges of translation noise and Persian-specific linguistic features.

Training curve

Epoch Train Loss Val Loss F1 Macro F1 Micro
1 0.1029 0.1026 0.3231 0.5313
2 0.0934 0.0950 0.3817 0.5597
3 0.0826 0.0939 0.4016 0.5585
4 0.0768 0.0951 0.4105 0.5574

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "aydakikio/Khayyam"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

LABELS = [
    "admiration", "amusement", "anger", "annoyance", "approval", "caring",
    "confusion", "curiosity", "desire", "disappointment", "disapproval",
    "disgust", "embarrassment", "excitement", "fear", "gratitude", "grief",
    "joy", "love", "nervousness", "optimism", "pride", "realization",
    "relief", "remorse", "sadness", "surprise", "neutral"
]

THRESHOLD = 0.3

def predict_emotions(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
    with torch.no_grad():
        logits = model(**inputs).logits
    probs = torch.sigmoid(logits).squeeze()
    predicted = [LABELS[i] for i, p in enumerate(probs) if p >= THRESHOLD]
    return predicted

# Example
text = "امروز خیلی خوشحالم، همه چیز عالی پیش رفت!"
print(predict_emotions(text))
# → ['joy', 'optimism', 'excitement']

Training Details

Dataset

  • Dataset: GoEmotions Persian
  • Source: Persian translation of Google GoEmotions using the Gap GPT API
  • Size: 54,263 samples (train: 43,410 / dev: 5,426 / test: 5,427)
  • Labels: 27 emotion categories + Neutral (28 total)
  • License: CC BY 4.0

Hyperparameters

Parameter Value
Epochs 4
Learning rate 2e-5
Batch size 16
Max sequence length 128
Warmup ratio 0.1
Weight decay 0.01
Optimizer AdamW
Loss function Binary Cross-Entropy (multi-label)
Precision fp16

Emotion Labels

The model classifies text into any combination of these 28 emotions:

admiration · amusement · anger · annoyance · approval · caring · confusion · curiosity · desire · disappointment · disapproval · disgust · embarrassment · excitement · fear · gratitude · grief · joy · love · nervousness · optimism · pride · realization · relief · remorse · sadness · surprise · neutral


Limitations

  • Trained on translated data — some label noise may exist due to automatic translation
  • Performance on rare emotion classes (e.g., grief, pride, relief) is lower than on frequent ones
  • Optimized for Persian Reddit-style short text; may generalize differently to formal or dialectal Persian

Citation

If you use this model, please cite:

Version 1.0
@misc{ayda_khoshkhan_2026,
    author       = { Ayda Khoshkhan },
    title        = { Khayyam (Revision 4cbc751) },
    year         = 2026,
    url          = { https://huggingface.co/aydakikio/Khayyam },
    doi          = { 10.57967/hf/9089 },
    publisher    = { Hugging Face }
}

Also cite the base model and dataset:

@article{ParsBERT,
  title     = {ParsBERT: Transformer-based Model for Persian Language Understanding},
  author    = {Mehrdad Farahani and Mohammad Gharachorloo and Marzieh Farahani and Mohammad Manthouri},
  journal   = {Neural Processing Letters},
  year      = {2021},
  publisher = {Springer}
}

@inproceedings{demszky-etal-2020-goemotions,
  title     = {{G}o{E}motions: A Dataset of Fine-Grained Emotions},
  author    = {Demszky, Dorottya and Movshovitz-Attias, Dana and Ko, Jeongwoo and Cowen, Alan and Nemade, Gaurav and Ravi, Sujith},
  booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
  year      = {2020},
  publisher = {Association for Computational Linguistics}
}

License

This model is released under the Apache 2.0 license, consistent with the base model (HooshvareLab/bert-fa-zwnj-base) and the original GoEmotions dataset.

The training dataset (aydakikio/goemotion_persian) is licensed under CC BY 4.0.

Downloads last month
95
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aydakikio/Khayyam

Finetuned
(8)
this model