Instructions to use mlnomad/modernbert-embed-base-yat with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use mlnomad/modernbert-embed-base-yat with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("mlnomad/modernbert-embed-base-yat", trust_remote_code=True) sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
modernbert-embed-base-yat
nomic-ai/modernbert-embed-base with every GeGLU feed-forward block replaced by a
sigmoid-gated Yat-kernel MLP (an alignment / inverse-distance kernel primitive, no GELU/GeGLU).
Only the 22 feed-forward blocks are changed; attention, embeddings and norms are the base model's.
The Yat FFNs are fit by end-to-end last-layer distillation: freeze everything but the FFNs and train them so the model's final hidden state matches the frozen GeGLU teacher (normalized MSE on the last-layer hidden states + a cosine term on the mean-pooled embedding), one epoch over all-nli sentences. Matching only the final representation — rather than imitating each GeGLU block pointwise, which hits a function-class ceiling — lets the kernel layers reallocate computation and recover full teacher parity.
Evaluation (MTEB STS, cosine Spearman)
| Task | base modernbert-embed-base (GeGLU) |
this model (Yat) |
|---|---|---|
| STSBenchmark | 0.835 | 0.815 |
| STS12 | 0.676 | 0.718 |
| STS16 | 0.835 | 0.814 |
| average | 0.782 | 0.782 |
The Yat-kernel swap reaches the same average STS as the GeGLU base (and is ahead of it on STS12).
Scores are reproduced after a Hub round-trip (trust_remote_code=True). The bundled custom
architecture (modeling_yatmodernbert.py) is loaded automatically.
Usage
from sentence_transformers import SentenceTransformer
m = SentenceTransformer("mlnomad/modernbert-embed-base-yat", trust_remote_code=True)
emb = m.encode(["A man is eating food.", "A man is eating a meal."])
The Yat FFN
g(x) = ( softplus(a) * (x·W + b)^2 / (||x - W||^2 + exp(le)) * sigmoid(gate(x)) ) @ A + c
a non-negative rational (alignment-over-distance) kernel feature with a sigmoid gate, replacing the
GeGLU map d -> 4d -> d at the same hidden width.
Notes
- Reaches teacher parity by distillation; this is the ceiling of distillation (matches, does not beat the base). A light contrastive fine-tune of the FFNs alone does not exceed the base.
- Quantizes losslessly to int8 (weight-only PTQ leaves STS unchanged); below 8 bits the rational kernel is more sensitive than GeGLU under naive uniform quantization.
Part of the ⵟ-kernel research project (kernel-native replacements for transformer FFNs).
- Downloads last month
- 271
Model tree for mlnomad/modernbert-embed-base-yat
Base model
answerdotai/ModernBERT-baseEvaluation results
- cosine_spearman on MTEB STSBenchmarkself-reported0.815
- cosine_spearman on MTEB STS12self-reported0.718
- cosine_spearman on MTEB STS16self-reported0.814