BiomedBERT Small: Medical models at 22.7M parameters

Community Article Published April 28, 2026

This article introduces the new BiomedBERT Small series of models. This is a solid-performing small model (22.7M) series fitting in between the 110M parameter BiomedBERT Base model and the tiny BiomedBERT Hash series of models.

These models are solid performers both in speed and accuracy. The dense embeddings model even beats the original PubMedBERT Embeddings model across the board at only 20% of the parameters. It also does much better than all-MiniLM-L6-v2, a commonly used small model which is roughly the same size.

These models are comparable in size to the popular all-MiniLM-L6-v2, meaning that it can run in CPU-only environments.

The following new models are released as part of this effort. All models have an Apache 2.0 license.

Model Description
biomedbert-small Base 22.7M parameter language model
biomedbert-small-embeddings Small Sentence Transformers model for embeddings
biomedbert-small-colbert Late interaction (ColBERT) small model
biomedbert-base-embeddings Improved Base Sentence Transformers model for embeddings

Building a Strong Baseline

In order to create task-specific models, a strong baseline is necessary. A 22.7M parameter BERT encoder-only model was trained on data from PubMed. The raw data was transformed using PaperETL with the results stored as a local dataset via the Hugging Face Datasets library. Masked language modeling was the training objective.

After training, the model was evaluated using this Medical Abstracts Text Classification Dataset. A handful of biomedical models and general models were selected for comparison.

Model Parameters Accuracy Loss
biomedbert-hash-nano 0.969M 0.6195 0.9464
biomedbert-small 22.7M 0.6274 0.8647
bert-base-uncased 110M 0.6118 0.9712
biomedbert-base 110M 0.6195 0.9037
ModernBERT-base 149M 0.5672 1.1079
BioClinical-ModernBERT-base 149M 0.5679 1.0915

As we can see, this model performs very well against models much larger in size and this serves as a strong baseline.


Training a Small Embeddings model

With this strong baseline and teacher model, we can now train a small embeddings model.

biomedbert-small-embeddings was trained using Sentence Transformers. The training dataset was generated using a random sample of PubMed title-abstract pairs along with similar title pairs.

The training workflow was a two-step distillation process as follows.

  • Distill embeddings from the larger pubmedbert-base-embeddings model using this model distillation script from Sentence Transformers.
  • Build a distilled dataset of teacher scores using the biomedbert-base-reranker cross-encoder for a separate random sample of title-abstract pairs.
  • Further fine-tune the model on the distilled dataset using KLDivLoss.

Training ColBERT models

A similar methodology as above was employed to train biomedbert-small-colbert as follows.

  • Train a model with MSELoss using biomedbert-small-embeddings as the base model.
  • Build a distilled dataset of teacher scores using the biomedbert-base-reranker cross-encoder for a separate random sample of title-abstract pairs.
  • Fine-tune the model on the distilled dataset using KLDivLoss.

Fine-Tuning the base PubMedBERT Embeddings model

The original PubMedBERT Embeddings was released almost 3 years ago. It gets over 500K downloads a month and has been cited many times in literature.

A simple idea was explored as part of this effort. What if we fine-tuned this model on the same distilled dataset used for the nano and small series of models? Turns out this adds a sizable performance boost to the base model as shown below.


Evaluation Results

Performance of these models are compared to previously released models trained on medical literature. The most commonly used small embeddings model is also included for comparison.

The following datasets were used to evaluate model performance.

  • PubMed QA
    • Subset: pqa_labeled, Split: train, Pair: (question, long_answer)
  • PubMed Subset
    • Split: test, Pair: (title, text)
  • PubMed Summary
    • Subset: pubmed, Split: validation, Pair: (article, abstract)

Evaluation results are shown below. The Pearson correlation coefficient is used as the evaluation metric.

Model PubMed QA PubMed Subset PubMed Summary Average
all-MiniLM-L6-v2 90.40 95.92 94.07 93.46
biomedbert-base-colbert 94.59 97.18 96.21 95.99
biomedbert-base-embeddings 94.60 98.39 97.61 96.87
biomedbert-base-reranker 97.66 99.76 98.81 98.74
biomedbert-small-colbert 93.51 97.20 95.85 95.52
biomedbert-small-embeddings 93.25 97.93 96.65 95.94
biomedbert-hash-nano-embeddings 90.39 96.29 95.32 94.00
pubmedbert-base-embeddings 93.27 97.00 96.58 95.62

The 22.7M parameter small models pack quite a punch. biomedbert-small-embeddings beats the original PubMedBERT Embeddings model across the board at only 20% of the parameters.

As with other ColBERT models on this dataset, it tends to score lower with longer form queries. But note how it outperforms it's equivalent small model on the PubMed QA dataset. For traditional user queries, this model will likely get better results in production.

Lastly, the new biomedbert-base-embeddings model is a sizable jump over the original PubMedBERT Embeddings model.


Wrapping up

This article introduced the new BiomedBERT Small series of models. It also adds new strong-performing, standard-sized dense embeddings model.

If you're interested in building custom models like this for your data or domain area, feel free to reach out!

NeuML is the company behind txtai and we provide AI consulting services around our stack. Schedule a meeting or send a message to learn more.

We're also building an easy and secure way to run hosted txtai applications with txtai.cloud.

Community

Sign up or log in to comment