BEREL Linker NER

A Hebrew Named Entity Recognition (NER) model for Rabbinic literature, fine-tuned from BEREL 3.0 โ€” a BERT-based language model pre-trained on Rabbinic Hebrew texts by DICTA.

Model Description

This model identifies two entity types in Rabbinic Hebrew text:

Label Hebrew Description
Cit (B-ืžืงื•ืจ / I-ืžืงื•ืจ) ืžืงื•ืจ Citations โ€” references to Jewish texts and sources
Per (B-ื‘ืŸ-ืื“ื / I-ื‘ืŸ-ืื“ื) ื‘ืŸ-ืื“ื Persons โ€” names of people

It uses BIO tagging and was trained for the purpose of automatically linking citations and persons in Sefaria's corpus of Rabbinic literature.

Training Details

  • Base model: dicta-il/BEREL_3.0
  • Architecture: BertForTokenClassification (BERT-base, 12 layers, 12 attention heads, hidden size 768)
  • Parameters: ~183.8M (larger than standard BERT-base due to 128,000-token Hebrew vocabulary)
  • Training dataset: Sefaria/he_berel_gold (~3,000 annotated examples of Rabbinic text, split 80/20 train/test)
  • Batch size: 16
  • Early stopping patience: 2 epochs

Performance

Best checkpoint was saved at epoch 3 (of a possible 10) via early stopping:

Metric Score
F1 87.2%
Precision 85.7%
Recall 88.8%
Eval loss 0.0815

Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

model_id = "Sefaria/berel-linker-ner"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForTokenClassification.from_pretrained(model_id)

ner = pipeline(
    "ner",
    model=model,
    tokenizer=tokenizer,
    aggregation_strategy="first",
    stride=128,
)

text = "ื“ื‘ืจื™ ื”ืจืžื‘\"ื ื‘ื”ืœื›ื•ืช ืฉื‘ืช"
entities = ner(text)
print(entities)

Label Map

{
  "O": 0,
  "I-ืžืงื•ืจ": 1,
  "I-ื‘ืŸ-ืื“ื": 2,
  "B-ืžืงื•ืจ": 5,
  "B-ื‘ืŸ-ืื“ื": 6
}

About

Developed by Sefaria for automated entity linking in classical Jewish texts.

Downloads last month
6
Safetensors
Model size
0.2B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Sefaria/berel-linker-ner

Finetuned
(6)
this model

Dataset used to train Sefaria/berel-linker-ner