ibrahimhamamci/CT-RATE
Preview • Updated • 120k • 259
Links: Project page — Paper — Code — Models
Disease-Aware Vision–Language Pretraining for 3D CT
We pretrain a 3D CT vision–language model on 159k report–volume pairs with two new supervision signals: prompt-based disease labels for classification and intra-scan snippet localization for axial depth grounding. A single unified model reaches state-of-the-art retrieval on CT-RATE, competitive disease classification, and slice-level localization at 12 mm resolution.
See the GitHub repository.
src/radfinder are based on SPECTREsrc/rate is used to create binary labels based on text reports and is based on RATErequirements.txtassets/demo/s0859/ is case s0859 from TotalSegmentator v2 (Wasserthal et al., CC-BY-4.0).LICENSE) unless a file header says otherwise. Files in
src/rate/ that carry a # Vendored from YalaLab/rate ... (ECL 2.0) header
are derivatives of the upstream rate package and are licensed under ECL 2.0
(see LICENSE_RATE).LICENSE_MODELS.If you use this code, models, or results, please cite:
@inproceedings{ging2026radfinder,
author = {Simon Ging and Philipp Arnold and Sebastian Walter and Hani Alnahas and Hannah Bast and Elmar Kotter and Jiancheng Yang and Behzad Bozorgtabar and Thomas Brox},
title = {Learning to Read Where to Look: Disease-Aware Vision--Language Pretraining for 3{D} {CT}},
booktitle = {Medical Image Computing and Computer Assisted Intervention -- {MICCAI} 2026, Strasbourg, France, September 27 -- October 1, 2026, Proceedings},
series = {Lecture Notes in Computer Science},
publisher = {Springer},
year = {2026},
note = {To appear},
}
Base model
cclaess/SPECTRE