BEREL-seg: TBD

State-of-the-art language model for Rabbinic Hebrew, released [here] - add link.

This model is fine-tuned from BEREL_3.0 for the prefix segmentation task.

Sample usage:

from transformers import AutoModel, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('dicta-il/BEREL-seg')
model = AutoModel.from_pretrained('dicta-il/BEREL-seg', trust_remote_code=True)

model.eval()

sentence = 'ื•ื–ื” ืœืฉื•ืŸ ื”ืจืžื‘ืดืŸ ื‘ืคื™ืจื•ืฉื• ืขืœ ื”ืชื•ืจื”, ืฉื”ื“ื‘ืจ ื™ื“ื•ืข ื•ืžืคื•ืจืกื ืœื›ืœ ื‘ืขืœื™ ื”ืขื™ื•ืŸ ืฉืื™ืŸ ื”ืžืงืจื ื™ื•ืฆื ืžื™ื“ื™ ืคืฉื•ื˜ื• ืืฃ ืขืœ ืคื™ ืฉื”ื“ืจืฉ ืืžืช.'

print(model.predict([sentence], tokenizer))

Output:

[
  [
    [ "[CLS]" ],
    [ "ื•", "ื–ื”" ],
    [ "ืœืฉื•ืŸ" ],
    [ "ื”", "ืจืžื‘\"ืŸ" ],
    [ "ื‘", "ืคื™ืจื•ืฉื•" ],
    [ "ืขืœ" ],
    [ "ื”", "ืชื•ืจื”" ],
    [ ", " ],
    [ "ืฉื”ื“", "ื‘ืจ" ],
    [ "ื™ื“ื•ืข" ],
    [ "ื•", "ืžืคื•ืจืกื" ],
    [ "ืœ", "ื›ืœ" ],
    [ "ื‘ืขืœื™" ],
    [ "ื”", "ืขื™ื•ืŸ" ],
    [ "ืฉ", "ืื™ืŸ" ],
    [ "ื”", "ืžืงืจื" ],
    [ "ื™ื•ืฆื" ],
    [ "ืคืฉื•ื˜ื•" ],
    [ "ืืฃ" ],
    [ "ืขืœ" ],
    [ "ืคื™" ],
    [ "ืฉื”ื“", "ืจืฉ" ],
    [ "ืืžืช" ],
    [ "." ],
    [ "[SEP]" ]
  ]
]

Citation

If you use BEREL-seg in your research, please cite tbd

BibTeX:

tbd

License

Shield: CC BY 4.0

This work is licensed under a Creative Commons Attribution 4.0 International License.

CC BY 4.0

Downloads last month
103
Safetensors
Model size
0.2B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for dicta-il/BEREL-seg

Finetuned
(5)
this model