Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 15
How to use Bheri/ithasa-mmbert-v3 with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("Bheri/ithasa-mmbert-v3")
sentences = [
"The false promise of sovereignty.",
"२ जिगीषोः प्रयाणे च अमरः।",
"सत्त्वहेतु सुदृढकृत प्रतिज्ञा।",
"उम्र क्रमशः 16 वर्ष व 15 वर्ष है।"
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from jhu-clsp/mmBERT-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 128, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'attenuated vaccines:',
'कम संवेदनशील टीकेः',
'६.५% दसादशे',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.3723, 0.1543],
# [0.3723, 1.0000, 0.2746],
# [0.1543, 0.2746, 1.0000]])
eval-en-saTranslationEvaluator| Metric | Value |
|---|---|
| src2trg_accuracy | 0.616 |
| trg2src_accuracy | 0.604 |
| mean_accuracy | 0.61 |
sentence1 and sentence2| sentence1 | sentence2 | |
|---|---|---|
| type | string | string |
| details |
|
|
| sentence1 | sentence2 |
|---|---|
There was no Mughal tradition of primogeniture, the systematic passing of rule, upon an emperor's death, to his eldest son. |
चक्रवर्तिनः मृत्योः अनन्तरं तस्य शासनस्य व्यवस्थितरूपेण सङ्क्रमणस्य, मुघलपरम्परायाः ज्येष्ठपुत्राधिकारपद्धतिः नासीत्। |
The four sons of Shah Jahan all held governorships during their father's reign. |
शाह्-जहाँ-नामकस्य चत्वारः पुत्राः, सर्वे पितुः शासनकाले शासकपदम् अधारयन्। |
In this regard he discusses the correlation between social opportunities of education and health and how both of these complement economic and political freedoms as a healthy and well-educated person is better suited to make informed economic decisions and be involved in fruitful political demonstrations etc. |
अस्मिन् विषये सः शिक्षणस्य स्वास्थ्यस्य च सामाजिकावकाशानाम् अन्योन्य-सम्बन्धस्य, तथा च एतद्द्वयम् अपि आर्थिक-राजनैतिक-स्वातन्त्र्ययोः कथं पूरकं भवतः इति च चर्चां करोति, यतोहि स्वस्था सुशिक्षिता च व्यक्तिः ज्ञानपूर्वम् आर्थिकविषयान् निर्णेतुं तथा फलप्रदेषु राजनैतिकेषु प्रतिपादनादिषु संलग्नः भवितुं च अधिकारी भवति इति। |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"gather_across_devices": false
}
sentence1 and sentence2| sentence1 | sentence2 | |
|---|---|---|
| type | string | string |
| details |
|
|
| sentence1 | sentence2 |
|---|---|
plus 2 tempered glass screen protectors: |
6 पश्चात तापाभिसंतप्तॊ विदुर समार कर्शितः |
"Take sadaqah (alms) from their wealth in order to purify them with it." (p. |
अप्येकाङ्गेऽप्यधोवस्तुमिच्छामि च सुकुत्सिते" ॥ |
"Who could it possibly be?" |
कश्च तासेः सम्भवति ? |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"gather_across_devices": false
}
per_device_train_batch_size: 32num_train_epochs: 5max_steps: 12000learning_rate: 2e-05warmup_steps: 500gradient_accumulation_steps: 4bf16: Trueeval_strategy: stepsload_best_model_at_end: Trueper_device_train_batch_size: 32num_train_epochs: 5max_steps: 12000learning_rate: 2e-05lr_scheduler_type: linearlr_scheduler_kwargs: Nonewarmup_steps: 500optim: adamw_torch_fusedoptim_args: Noneweight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08optim_target_modules: Nonegradient_accumulation_steps: 4average_tokens_across_devices: Truemax_grad_norm: 1.0label_smoothing_factor: 0.0bf16: Truefp16: Falsebf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Nonetorch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneuse_liger_kernel: Falseliger_kernel_config: Noneuse_cache: Falseneftune_noise_alpha: Nonetorch_empty_cache_steps: Noneauto_find_batch_size: Falselog_on_each_node: Truelogging_nan_inf_filter: Trueinclude_num_input_tokens_seen: nolog_level: passivelog_level_replica: warningdisable_tqdm: Falseproject: huggingfacetrackio_space_id: trackioeval_strategy: stepsper_device_eval_batch_size: 8prediction_loss_only: Trueeval_on_start: Falseeval_do_concat_batches: Trueeval_use_gather_object: Falseeval_accumulation_steps: Noneinclude_for_metrics: []batch_eval_metrics: Falsesave_only_model: Falsesave_on_each_node: Falseenable_jit_checkpoint: Falsepush_to_hub: Falsehub_private_repo: Nonehub_model_id: Nonehub_strategy: every_savehub_always_push: Falsehub_revision: Noneload_best_model_at_end: Trueignore_data_skip: Falserestore_callback_states_from_checkpoint: Falsefull_determinism: Falseseed: 42data_seed: Noneuse_cpu: Falseaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedataloader_drop_last: Falsedataloader_num_workers: 0dataloader_pin_memory: Truedataloader_persistent_workers: Falsedataloader_prefetch_factor: Noneremove_unused_columns: Truelabel_names: Nonetrain_sampling_strategy: randomlength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falseddp_backend: Noneddp_timeout: 1800fsdp: []fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}deepspeed: Nonedebug: []skip_memory_metrics: Truedo_predict: Falseresume_from_checkpoint: Nonewarmup_ratio: Nonelocal_rank: -1prompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss | Validation Loss | eval-en-sa_mean_accuracy |
|---|---|---|---|---|
| 0.0034 | 100 | 3.1353 | - | - |
| 0.0068 | 200 | 2.7273 | - | - |
| 0.0102 | 300 | 1.8263 | - | - |
| 0.0137 | 400 | 1.1810 | - | - |
| 0.0171 | 500 | 0.8952 | - | - |
| 0.0205 | 600 | 0.7068 | - | - |
| 0.0239 | 700 | 0.5979 | - | - |
| 0.0273 | 800 | 0.5412 | - | - |
| 0.0307 | 900 | 0.5255 | - | - |
| 0.0341 | 1000 | 0.4847 | 0.2013 | 0.5045 |
| 0.0376 | 1100 | 0.4752 | - | - |
| 0.0410 | 1200 | 0.4645 | - | - |
| 0.0444 | 1300 | 0.4173 | - | - |
| 0.0478 | 1400 | 0.4220 | - | - |
| 0.0512 | 1500 | 0.4163 | - | - |
| 0.0546 | 1600 | 0.3978 | - | - |
| 0.0580 | 1700 | 0.3895 | - | - |
| 0.0614 | 1800 | 0.3778 | - | - |
| 0.0649 | 1900 | 0.3904 | - | - |
| 0.0683 | 2000 | 0.3656 | 0.1436 | 0.563 |
| 0.0717 | 2100 | 0.3565 | - | - |
| 0.0751 | 2200 | 0.3526 | - | - |
| 0.0785 | 2300 | 0.3632 | - | - |
| 0.0819 | 2400 | 0.3468 | - | - |
| 0.0853 | 2500 | 0.3506 | - | - |
| 0.0888 | 2600 | 0.3505 | - | - |
| 0.0922 | 2700 | 0.3466 | - | - |
| 0.0956 | 2800 | 0.3422 | - | - |
| 0.0990 | 2900 | 0.3393 | - | - |
| 0.1024 | 3000 | 0.3345 | 0.1240 | 0.587 |
| 0.1058 | 3100 | 0.3238 | - | - |
| 0.1092 | 3200 | 0.3230 | - | - |
| 0.1127 | 3300 | 0.3281 | - | - |
| 0.1161 | 3400 | 0.3246 | - | - |
| 0.1195 | 3500 | 0.3111 | - | - |
| 0.1229 | 3600 | 0.3092 | - | - |
| 0.1263 | 3700 | 0.3187 | - | - |
| 0.1297 | 3800 | 0.3293 | - | - |
| 0.1331 | 3900 | 0.3246 | - | - |
| 0.1366 | 4000 | 0.3174 | 0.1165 | 0.598 |
| 0.1400 | 4100 | 0.3213 | - | - |
| 0.1434 | 4200 | 0.3167 | - | - |
| 0.1468 | 4300 | 0.3142 | - | - |
| 0.1502 | 4400 | 0.3070 | - | - |
| 0.1536 | 4500 | 0.3094 | - | - |
| 0.1570 | 4600 | 0.3084 | - | - |
| 0.1604 | 4700 | 0.3068 | - | - |
| 0.1639 | 4800 | 0.3060 | - | - |
| 0.1673 | 4900 | 0.3020 | - | - |
| 0.1707 | 5000 | 0.3072 | 0.1133 | 0.6045 |
| 0.1741 | 5100 | 0.3151 | - | - |
| 0.1775 | 5200 | 0.3121 | - | - |
| 0.1809 | 5300 | 0.3059 | - | - |
| 0.1843 | 5400 | 0.3069 | - | - |
| 0.1878 | 5500 | 0.3069 | - | - |
| 0.1912 | 5600 | 0.3134 | - | - |
| 0.1946 | 5700 | 0.3017 | - | - |
| 0.1980 | 5800 | 0.3088 | - | - |
| 0.2014 | 5900 | 0.3011 | - | - |
| 0.2048 | 6000 | 0.3075 | 0.1109 | 0.608 |
| 0.2082 | 6100 | 0.2957 | - | - |
| 0.2117 | 6200 | 0.3049 | - | - |
| 0.2151 | 6300 | 0.2994 | - | - |
| 0.2185 | 6400 | 0.2951 | - | - |
| 0.2219 | 6500 | 0.3116 | - | - |
| 0.2253 | 6600 | 0.3155 | - | - |
| 0.2287 | 6700 | 0.2938 | - | - |
| 0.2321 | 6800 | 0.2824 | - | - |
| 0.2355 | 6900 | 0.2973 | - | - |
| 0.2390 | 7000 | 0.3111 | 0.1100 | 0.6065 |
| 0.2424 | 7100 | 0.2973 | - | - |
| 0.2458 | 7200 | 0.2995 | - | - |
| 0.2492 | 7300 | 0.2962 | - | - |
| 0.2526 | 7400 | 0.2994 | - | - |
| 0.2560 | 7500 | 0.2964 | - | - |
| 0.2594 | 7600 | 0.2997 | - | - |
| 0.2629 | 7700 | 0.2932 | - | - |
| 0.2663 | 7800 | 0.2993 | - | - |
| 0.2697 | 7900 | 0.2987 | - | - |
| 0.2731 | 8000 | 0.2898 | 0.1084 | 0.6085 |
| 0.2765 | 8100 | 0.3007 | - | - |
| 0.2799 | 8200 | 0.2935 | - | - |
| 0.2833 | 8300 | 0.2885 | - | - |
| 0.2868 | 8400 | 0.3021 | - | - |
| 0.2902 | 8500 | 0.2958 | - | - |
| 0.2936 | 8600 | 0.3056 | - | - |
| 0.2970 | 8700 | 0.2908 | - | - |
| 0.3004 | 8800 | 0.3096 | - | - |
| 0.3038 | 8900 | 0.2924 | - | - |
| 0.3072 | 9000 | 0.3019 | 0.1077 | 0.607 |
| 0.3107 | 9100 | 0.2985 | - | - |
| 0.3141 | 9200 | 0.2906 | - | - |
| 0.3175 | 9300 | 0.2961 | - | - |
| 0.3209 | 9400 | 0.3044 | - | - |
| 0.3243 | 9500 | 0.3005 | - | - |
| 0.3277 | 9600 | 0.2943 | - | - |
| 0.3311 | 9700 | 0.2948 | - | - |
| 0.3345 | 9800 | 0.3046 | - | - |
| 0.3380 | 9900 | 0.2948 | - | - |
| 0.3414 | 10000 | 0.3060 | 0.1083 | 0.608 |
| 0.3448 | 10100 | 0.2906 | - | - |
| 0.3482 | 10200 | 0.2958 | - | - |
| 0.3516 | 10300 | 0.2919 | - | - |
| 0.3550 | 10400 | 0.3041 | - | - |
| 0.3584 | 10500 | 0.3055 | - | - |
| 0.3619 | 10600 | 0.2975 | - | - |
| 0.3653 | 10700 | 0.2984 | - | - |
| 0.3687 | 10800 | 0.2883 | - | - |
| 0.3721 | 10900 | 0.2949 | - | - |
| 0.3755 | 11000 | 0.2987 | 0.1083 | 0.6085 |
| 0.3789 | 11100 | 0.2938 | - | - |
| 0.3823 | 11200 | 0.2942 | - | - |
| 0.3858 | 11300 | 0.2879 | - | - |
| 0.3892 | 11400 | 0.2909 | - | - |
| 0.3926 | 11500 | 0.2899 | - | - |
| 0.3960 | 11600 | 0.2921 | - | - |
| 0.3994 | 11700 | 0.2944 | - | - |
| 0.4028 | 11800 | 0.2985 | - | - |
| 0.4062 | 11900 | 0.3027 | - | - |
| 0.4097 | 12000 | 0.2988 | 0.1082 | 0.61 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
jhu-clsp/mmBERT-base