Instructions to use AronowLab/BOND-reranker with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use AronowLab/BOND-reranker with sentence-transformers:
from sentence_transformers import CrossEncoder model = CrossEncoder("AronowLab/BOND-reranker") query = "Which planet is known as the Red Planet?" passages = [ "Venus is often called Earth's twin because of its similar size and proximity.", "Mars, known for its reddish appearance, is often referred to as the Red Planet.", "Jupiter, the largest planet in our solar system, has a prominent red spot.", "Saturn, famous for its rings, is sometimes mistaken for the Red Planet." ] scores = model.predict([(query, passage) for passage in passages]) print(scores) - Notebooks
- Google Colab
- Kaggle
BOND-reranker
A cross-encoder reranker model fine-tuned for biomedical ontology entity normalization, designed to work with the BOND (Biomedical Ontology Neural Disambiguation) system.
Model Description
This model is a cross-encoder reranker trained to improve the accuracy of entity normalization by re-ranking candidate ontology terms retrieved by BOND's initial retrieval stage. It takes a query-candidate pair and outputs a relevance score.
Training Framework: Sentence Transformers with cross-encoder architecture
Model Architecture
- Type: Cross-Encoder
- Framework: Sentence Transformers
- Max Sequence Length: 512 tokens
- Output: Single relevance score per query-candidate pair
- Parameters: ~110M (based on BiomedBERT-base)
Training Data
The model was trained on biomedical entity normalization data covering multiple ontologies including:
- MONDO (diseases)
- HPO (phenotypes)
- UBERON (anatomy)
- Cell Ontology (CL)
- Gene Ontology (GO)
- And other biomedical ontologies
Training data consists of query-candidate pairs with relevance labels, where queries are biomedical entity mentions and candidates are ontology terms.
Usage
With BOND Pipeline
from bond.config import BondSettings
from bond.pipeline import BondMatcher
# Configure BOND to use this reranker
settings = BondSettings(
"model_path", # Replace with your model path
enable_reranker=True
)
matcher = BondMatcher(settings=settings)
Direct Usage
import torch
from sentence_transformers import CrossEncoder
# Load model from local path
model = CrossEncoder(
"model_path", # Replace with your model path
device='cuda' if torch.cuda.is_available() else 'cpu'
)
# Example: Rank candidates for a query
query = "cell_type: C_BEST4; tissue: descending colon; organism: Homo sapiens"
candidates = [
"label: smooth muscle fiber of descending colon; synonyms: non-striated muscle fiber of descending colon",
"label: smooth muscle cell of colon; synonyms: non-striated muscle fiber of colon",
"label: epithelial cell of colon; synonyms: colon epithelial cell"
]
# Get ranked results with probabilities
ranked_results = model.rank(query, candidates, return_documents=True, top_k=3)
print("Top 3 ranked results")
for result in ranked_results:
prob = torch.sigmoid(torch.tensor(result['score'])).item()
print(f"{prob:.8f} - {result['text']}")
Performance
This reranker is designed to work as the final stage in the BOND pipeline:
- Retrieval: Exact + BM25 + Dense retrieval with LLM expansion
- Reranking: This cross-encoder model scores and re-ranks top candidates
- Output: Final ranked list of ontology terms
The reranker significantly improves precision by re-scoring the top-k candidates (typically k=100) retrieved by the initial retrieval stage.
Evaluation Metrics
Evaluated on biomedical entity normalization development set:
| Metric | Score |
|---|---|
| Accuracy | 97.50% |
| F1 Score | 82.37% |
| Precision | 79.58% |
| Recall | 85.36% |
| Average Precision | 88.67% |
| Eval Loss | 0.230 |
Best Model: Checkpoint at step 69,500 (epoch 2.28) with best metric score of 0.9734
Model Files
config.json- Model configurationmodel.safetensors- Model weights in SafeTensors formattokenizer.json- Fast tokenizervocab.txt- Vocabulary filespecial_tokens_map.json- Special tokens mappingtokenizer_config.json- Tokenizer configuration
License
Apache 2.0
- Downloads last month
- 3