Khayyam — Persian Multi-Label Emotion Classifier

Khayyam is a fine-tuned Persian BERT model for fine-grained, multi-label emotion classification across 28 emotion categories, based on the GoEmotions taxonomy. It is built on top of ParsBERT v3 and trained on GoEmotions Persian — a Persian translation of Google's GoEmotions benchmark dataset.

The model is named after Omar Khayyam, the 11th-century Persian poet, mathematician, and philosopher.

Model Details

Property	Value
Base model	`HooshvareLab/bert-fa-zwnj-base` (ParsBERT v3)
Task	Multi-label text classification
Language	Persian (Farsi / `fa`)
Number of labels	28
Decision threshold	0.3
License	Apache 2.0

Evaluation Results

Evaluated on the held-out test split of GoEmotions Persian (5,427 samples):

Metric	Score
F1 Macro	0.4124
F1 Micro	0.5563

For reference, the original GoEmotions paper (Demszky et al., 2020) reports F1 Macro of 0.46 for BERT-base on the English dataset. Khayyam achieves competitive results considering the additional challenges of translation noise and Persian-specific linguistic features.

Training curve

Epoch	Train Loss	Val Loss	F1 Macro	F1 Micro
1	0.1029	0.1026	0.3231	0.5313
2	0.0934	0.0950	0.3817	0.5597
3	0.0826	0.0939	0.4016	0.5585
4	0.0768	0.0951	0.4105	0.5574

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "aydakikio/Khayyam"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

LABELS = [
    "admiration", "amusement", "anger", "annoyance", "approval", "caring",
    "confusion", "curiosity", "desire", "disappointment", "disapproval",
    "disgust", "embarrassment", "excitement", "fear", "gratitude", "grief",
    "joy", "love", "nervousness", "optimism", "pride", "realization",
    "relief", "remorse", "sadness", "surprise", "neutral"
]

THRESHOLD = 0.3

def predict_emotions(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
    with torch.no_grad():
        logits = model(**inputs).logits
    probs = torch.sigmoid(logits).squeeze()
    predicted = [LABELS[i] for i, p in enumerate(probs) if p >= THRESHOLD]
    return predicted

# Example
text = "امروز خیلی خوشحالم، همه چیز عالی پیش رفت!"
print(predict_emotions(text))
# → ['joy', 'optimism', 'excitement']

Training Details

Dataset

Dataset: GoEmotions Persian
Source: Persian translation of Google GoEmotions using the Gap GPT API
Size: 54,263 samples (train: 43,410 / dev: 5,426 / test: 5,427)
Labels: 27 emotion categories + Neutral (28 total)
License: CC BY 4.0

Hyperparameters

Parameter	Value
Epochs	4
Learning rate	2e-5
Batch size	16
Max sequence length	128
Warmup ratio	0.1
Weight decay	0.01
Optimizer	AdamW
Loss function	Binary Cross-Entropy (multi-label)
Precision	fp16

Emotion Labels

The model classifies text into any combination of these 28 emotions:

admiration · amusement · anger · annoyance · approval · caring · confusion · curiosity · desire · disappointment · disapproval · disgust · embarrassment · excitement · fear · gratitude · grief · joy · love · nervousness · optimism · pride · realization · relief · remorse · sadness · surprise · neutral

Limitations

Trained on translated data — some label noise may exist due to automatic translation
Performance on rare emotion classes (e.g., grief, pride, relief) is lower than on frequent ones
Optimized for Persian Reddit-style short text; may generalize differently to formal or dialectal Persian

Citation

If you use this model, please cite:

Version 1.0

@misc{ayda_khoshkhan_2026,
    author       = { Ayda Khoshkhan },
    title        = { Khayyam (Revision 4cbc751) },
    year         = 2026,
    url          = { https://huggingface.co/aydakikio/Khayyam },
    doi          = { 10.57967/hf/9089 },
    publisher    = { Hugging Face }
}

Also cite the base model and dataset:

@article{ParsBERT,
  title     = {ParsBERT: Transformer-based Model for Persian Language Understanding},
  author    = {Mehrdad Farahani and Mohammad Gharachorloo and Marzieh Farahani and Mohammad Manthouri},
  journal   = {Neural Processing Letters},
  year      = {2021},
  publisher = {Springer}
}

@inproceedings{demszky-etal-2020-goemotions,
  title     = {{G}o{E}motions: A Dataset of Fine-Grained Emotions},
  author    = {Demszky, Dorottya and Movshovitz-Attias, Dana and Ko, Jeongwoo and Cowen, Alan and Nemade, Gaurav and Ravi, Sujith},
  booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
  year      = {2020},
  publisher = {Association for Computational Linguistics}
}

License

This model is released under the Apache 2.0 license, consistent with the base model (HooshvareLab/bert-fa-zwnj-base) and the original GoEmotions dataset.

The training dataset (aydakikio/goemotion_persian) is licensed under CC BY 4.0.

Downloads last month: 95

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for aydakikio/Khayyam

Base model

HooshvareLab/bert-fa-zwnj-base

Finetuned

(8)

this model