RadFinder

Links: Project page — Paper — Code — Models

Disease-Aware Vision–Language Pretraining for 3D CT

We pretrain a 3D CT vision–language model on 159k report–volume pairs with two new supervision signals: prompt-based disease labels for classification and intra-scan snippet localization for axial depth grounding. A single unified model reaches state-of-the-art retrieval on CT-RATE, competitive disease classification, and slice-level localization at 12 mm resolution.

Usage

See the GitHub repository.

Training data

RefCT (internal): ~98k report–volume pairs from ~50k patients at a single hospital; in-house clinical data, not publicly released.
CT-RATE (CC BY-NC-SA 4.0)
Merlin (Stanford AIMI non-commercial research DUA)
INSPECT (Stanford AIMI non-commercial research DUA)

Further acknowledgements

The model and parts of the SigLIP training framework in src/radfinder are based on SPECTRE
The text processing pipeline in src/rate is used to create binary labels based on text reports and is based on RATE
We thank the MONAI, timm, and Hugging Face transformers maintainers for the libraries and all other package maintainers listed in requirements.txt
The demo scan under assets/demo/s0859/ is case s0859 from TotalSegmentator v2 (Wasserthal et al., CC-BY-4.0).
Funding, additional acknowledgements, full citations: see paper.

License

All code is MIT (see LICENSE) unless a file header says otherwise. Files in src/rate/ that carry a # Vendored from YalaLab/rate ... (ECL 2.0) header are derivatives of the upstream rate package and are licensed under ECL 2.0 (see LICENSE_RATE).
RadFinder model weights are CC BY-NC-SA 4.0, see LICENSE_MODELS.
- Note: the weights are subject to the original dataset licenses. Users intending to use RadFinder in commercial settings should verify dataset and model licensing and obtain any required permissions.

Citation

If you use this code, models, or results, please cite:

@inproceedings{ging2026radfinder,
  author    = {Simon Ging and Philipp Arnold and Sebastian Walter and Hani Alnahas and Hannah Bast and Elmar Kotter and Jiancheng Yang and Behzad Bozorgtabar and Thomas Brox},
  title     = {Learning to Read Where to Look: Disease-Aware Vision--Language Pretraining for 3{D} {CT}},
  booktitle = {Medical Image Computing and Computer Assisted Intervention -- {MICCAI} 2026, Strasbourg, France, September 27 -- October 1, 2026, Proceedings},
  series    = {Lecture Notes in Computer Science},
  publisher = {Springer},
  year      = {2026},
  note      = {To appear},
}