You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

DentVLM: A Multimodal Vision-Language Model for Comprehensive Dental Diagnosis and Enhanced Clinical Practice

DentVLM is a multimodal vision-language model for dental image understanding and diagnosis-oriented question answering. It supports dental image-question inputs for research tasks including malocclusion recognition, dental disease recognition, and region-aware dental image analysis. The model is released as a research artifact to support reproducibility and further academic research.

Model Access

The DentVLM model weights are publicly available from this Hugging Face model repository under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).

Please cite the associated manuscript and this repository when using DentVLM.

Source Code and Reproducibility

The source code, training scripts, inference scripts, evaluation scripts, and example data format are available at:

https://github.com/ZJUI-AI4H/DentVLM

The GitHub repository includes instructions for environment setup, model loading, inference, and evaluation. Users should refer to the repository documentation for the exact software versions and command-line examples used in the associated study.

Intended Use

DentVLM is intended for:

  • Academic and non-commercial research on dental multimodal vision-language modeling
  • Dental image understanding and question-answering research
  • Reproduction and extension of the DentVLM training, inference, and evaluation pipeline
  • Benchmarking on dental multimodal tasks
  • Development of research workflows for dental AI evaluation

Not Intended For

DentVLM is not intended or approved for:

  • Use as the sole basis for clinical diagnosis, treatment planning, triage, or patient management
  • Emergency medical or dental decision-making
  • Autonomous or automated clinical decision-making without appropriate validation, professional oversight, and regulatory approval
  • Commercial use of the released model weights without separate permission from the rights holder
  • Unlawful, harmful, privacy-invasive, or unethical applications

Limitations

  • DentVLM is developed as a research model for dental image understanding and diagnosis-oriented question answering.
  • Model outputs should be interpreted in the context of professional expertise and task-specific evaluation.
  • Model performance may vary with image quality, imaging modality, acquisition conditions, patient population, annotation standards, and prompt formulation.
  • Performance in new clinical environments, imaging protocols, or patient populations may differ from the results reported in the associated study.
  • Users are responsible for conducting appropriate validation before any downstream research or translational use.

Ethical Considerations

Users should ensure that all dental images and associated data are collected, processed, stored, and used in compliance with applicable privacy, consent, institutional review, and data protection requirements.

Users should not use DentVLM to attempt to identify, re-identify, or infer sensitive information about any individual. The model should not be used for automated clinical decision-making without appropriate validation, professional oversight, and regulatory approval.

License

DentVLM model weights

The DentVLM fine-tuned model weights and DentVLM-specific model release materials are licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0), unless otherwise stated.

Under CC BY-NC 4.0, the released model weights may be used, shared, and adapted for non-commercial purposes, provided that appropriate attribution is given and license terms are followed.

Source code

The DentVLM source code, including training, inference, and evaluation scripts, is released in the GitHub repository under the Apache License 2.0, unless otherwise stated in individual files.

Upstream components

DentVLM is built on Qwen/Qwen2-VL-7B-Instruct. Qwen2-VL-7B-Instruct is released by Alibaba Cloud under the Apache License 2.0. Third-party components, including Qwen2-VL, LLaMA-Factory, vLLM, and their associated files, remain subject to their original licenses and notices.

This model release does not grant rights to use any third-party trademarks or protected clinical data.

Citation

If you use DentVLM, please cite the associated manuscript and this model repository:

@article{meng2025dentvlm,
  title={Dentvlm: A multimodal vision-language model for comprehensive dental diagnosis and enhanced clinical practice},
  author={Meng, Zijie and Hao, Jin and Dai, Xiwei and Feng, Yang and Liu, Jiaxiang and Feng, Bin and Wu, Huikai and Gai, Xiaotang and Zhu, Hengchuan and Hu, Tianxiang and others},
  journal={arXiv preprint arXiv:2509.23344},
  year={2025}
}
Downloads last month
-
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ZJU-AI4H/DentVLM

Base model

Qwen/Qwen2-VL-7B
Finetuned
(604)
this model

Paper for ZJU-AI4H/DentVLM