AffilBERT
A ModernBERT embedding model, based on Nomic's ModernBERT embed base, finetuned using contrastive loss on the names of research institutions.
This model is intended for researcher affiliation canonicalization.
Description
Embeddings can be used to link or standardize researcher affiliations by way of measuring the cosine similarity between two encoded representations.
However, standard embedding models frequently confound geographic or topical commonalities with affiliation identity. This may result in boston university computer science
being closer to college of charleston computer science than it is to boston university department of public health.
Training
This embedding model was trained using hard-negative mining and InfoNCE on a mixture of hand-annotated data gathered from PubMed alongside data sourced from ROR. Hard negatives were identified using TF-IDF in conjunction with false positive high-similarity pairs derived from encoding strings with the base embedding model.
The outcome is a finetune which more aggressively separates different institutions with confounding commonalities, when compared to the base model.

Usage
from transformers import AutoTokenizer, AutoModel
import torch, torch.nn.functional as F
model_id = "aimgo/AffilBERT"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModel.from_pretrained(model_id).eval()
def embed(texts):
enc = tokenizer(["clustering: " + t for t in texts],
padding=True, truncation=True,
max_length=128, return_tensors="pt")
with torch.no_grad():
out = model(**enc).last_hidden_state
mask = enc['attention_mask'].unsqueeze(-1).float()
emb = (out * mask).sum(1) / mask.sum(1).clamp(min=1e-9)
return F.normalize(emb, p=2, dim=-1)
strings = [
"boston university computer science",
"harvard college computer science",
"college of charleston",
"cofc",
"university of south carolina",
"clemson university",
"boston university public health",
]
x = embed(strings)
sim = (x @ x.t()).tolist()
Citation
If you use this model in your work, please cite:
@misc{mccarthy2026AffilBERT,
author = {McCarthy, A. M. and Rao, Sowmya R.},
title = {{AffilBERT}},
year = {2026},
howpublished = {\url{https://huggingface.co/aimgo/AffilBERT}},
note = {Model}
}
- Downloads last month
- 39