Instructions to use doctolib-lab/finemed-entity-extractor-fr with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- GLiNER2
How to use doctolib-lab/finemed-entity-extractor-fr with GLiNER2:
from gliner2 import GLiNER2 model = GLiNER2.from_pretrained("doctolib-lab/finemed-entity-extractor-fr") # Extract entities text = "Apple CEO Tim Cook announced iPhone 15 in Cupertino yesterday." result = extractor.extract_entities(text, ["company", "person", "product", "location"]) print(result) - GLiNER
How to use doctolib-lab/finemed-entity-extractor-fr with GLiNER:
from gliner import GLiNER model = GLiNER.from_pretrained("doctolib-lab/finemed-entity-extractor-fr") - Notebooks
- Google Colab
- Kaggle
FineMed Medical-Entity Extractor (FR)
🤗 Blog | 📄 Paper | 💻 Code | 🌐 FineMed | 🩺 DoctoBERT
📚 Introduction
This is the medical-entity extractor used to compute the medical-term density axis of FineMed-fr. Given a French medical document, it extracts medical-term spans under an 8-class UMLS-adapted taxonomy (disease, drug, body_part, …). The density is then the ratio of characters inside the extracted spans to the document's total characters.
It is a GLiNER2 model (mDeBERTa-v3 backbone, 512-token context) fine-tuned on LLM annotations, one of the three lightweight annotators behind FineMed-fr (subdomain, educational quality, medical-term density).
🚀 How to Use
from gliner2 import GLiNER2
extractor = GLiNER2.from_pretrained("doctolib-lab/finemed-entity-extractor-fr")
# 8-class taxonomy; passing descriptions (not just the keys) improves extraction
labels = {
"disease": "Pathological condition: disease, syndrome, infection, cancer, injury, symptom, clinical finding, mental disorder",
"drug": "Chemical substance for therapy: prescription medication, vaccine, therapeutic compound, drug class, contrast agent",
"body_part": "Anatomical structure: organ, tissue, bone, muscle, blood vessel, nerve, cell, body fluid, anatomical region",
"medical_procedure": "Clinical action with methodology: surgery, diagnostic test, medical examination, laboratory test, imaging procedure",
"molecular_marker": "Molecular entity or biochemical substance: gene, protein, enzyme, receptor, genetic variant, biochemical analyte",
"clinical_device": "Manufactured medical object: surgical tool, implant, prosthetic, diagnostic scanner, monitoring equipment",
"vital_function": "Physiological parameter name: heart rate, blood pressure, respiratory rate, temperature, oxygen saturation",
"living_beings": "Non-human organism in biomedical context: bacterium, virus, fungus, parasite, pathogen, model organism",
}
text = "Le patient présente une pneumonie traitée par amoxicilline ..."
results = extractor.batch_extract_entities([text], labels, threshold=0.5)
print(results[0]["entities"])
# {"disease": ["pneumonie"], "drug": ["amoxicilline"], ...}
To reproduce FineMed's medical_entity_density, run extraction over the middle 512 tokens of each document, then divide the characters covered by the extracted spans by the document's total character count. Taking the middle window skips boilerplate at the document boundaries and keeps corpus-scale inference tractable.
🏷️ Entity Taxonomy
8 classes adapted from UMLS, keeping the medical-term-rich groups:
| class | covers |
|---|---|
disease |
disease, syndrome, infection, cancer, injury, symptom, clinical finding, mental disorder |
drug |
prescription medication, vaccine, therapeutic compound, drug class, contrast agent |
body_part |
organ, tissue, bone, muscle, blood vessel, nerve, cell, body fluid, anatomical region |
medical_procedure |
surgery, diagnostic test, medical examination, laboratory test, imaging procedure |
molecular_marker |
gene, protein, enzyme, receptor, genetic variant, biochemical analyte |
clinical_device |
surgical tool, implant, prosthetic, diagnostic scanner, monitoring equipment |
vital_function |
heart rate, blood pressure, respiratory rate, temperature, oxygen saturation |
living_beings |
bacterium, virus, fungus, parasite, pathogen, model organism |
🔧 Training
Fine-tuned from GLiNER2 on entity annotations produced by Qwen3-235B-A22B-Instruct via a two-pass self-review (Pass 1 extracts entities, Pass 2 reviews and corrects them) over roughly 300k documents. Best configuration: training prompts without per-class descriptions, inference prompts with descriptions. Entity-group order is shuffled during annotation to mitigate position bias. The two annotation prompts are in medical_entity_extract_prompt.txt (Pass 1) and medical_entity_review_prompt.txt (Pass 2).
⚠️ Intended Use & Limitations
Built to annotate French medical web text at corpus scale (to build FineMed-fr), not for clinical decision-making. It is tuned for density estimation over a 512-token window, not exhaustive document-level entity recognition.
⚖️ License
Apache-2.0, inherited from the GLiNER2 base model.
🏛️ Acknowledgments
This work was granted access to the HPC resources of IDRIS (Jean Zay) under the allocations 2025-AD011016291 and 2026-A0200617487 made by GENCI.
- Downloads last month
- 8
Model tree for doctolib-lab/finemed-entity-extractor-fr
Base model
fastino/gliner2-multi-v1