Instructions to use ZJU-AI4H/DentVLM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ZJU-AI4H/DentVLM with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="ZJU-AI4H/DentVLM")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("ZJU-AI4H/DentVLM")
model = AutoModelForMultimodalLM.from_pretrained("ZJU-AI4H/DentVLM")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use ZJU-AI4H/DentVLM with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ZJU-AI4H/DentVLM"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ZJU-AI4H/DentVLM",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/ZJU-AI4H/DentVLM

SGLang

How to use ZJU-AI4H/DentVLM with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ZJU-AI4H/DentVLM" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ZJU-AI4H/DentVLM",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ZJU-AI4H/DentVLM" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ZJU-AI4H/DentVLM",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use ZJU-AI4H/DentVLM with Docker Model Runner:
```
docker model run hf.co/ZJU-AI4H/DentVLM
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

DentVLM: A Multimodal Vision-Language Model for Comprehensive Dental Diagnosis and Enhanced Clinical Practice

DentVLM is a multimodal vision-language model for dental image understanding and diagnosis-oriented question answering. It supports dental image-question inputs for research tasks including malocclusion recognition, dental disease recognition, and region-aware dental image analysis. The model is released as a research artifact to support reproducibility and further academic research.

Model Access

The DentVLM model weights are publicly available from this Hugging Face model repository under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).

Please cite the associated manuscript and this repository when using DentVLM.

Source Code and Reproducibility

The source code, training scripts, inference scripts, evaluation scripts, and example data format are available at:

https://github.com/ZJUI-AI4H/DentVLM

The GitHub repository includes instructions for environment setup, model loading, inference, and evaluation. Users should refer to the repository documentation for the exact software versions and command-line examples used in the associated study.

Intended Use

DentVLM is intended for:

Academic and non-commercial research on dental multimodal vision-language modeling
Dental image understanding and question-answering research
Reproduction and extension of the DentVLM training, inference, and evaluation pipeline
Benchmarking on dental multimodal tasks
Development of research workflows for dental AI evaluation

Not Intended For

DentVLM is not intended or approved for:

Use as the sole basis for clinical diagnosis, treatment planning, triage, or patient management
Emergency medical or dental decision-making
Autonomous or automated clinical decision-making without appropriate validation, professional oversight, and regulatory approval
Commercial use of the released model weights without separate permission from the rights holder
Unlawful, harmful, privacy-invasive, or unethical applications

Limitations

DentVLM is developed as a research model for dental image understanding and diagnosis-oriented question answering.
Model outputs should be interpreted in the context of professional expertise and task-specific evaluation.
Model performance may vary with image quality, imaging modality, acquisition conditions, patient population, annotation standards, and prompt formulation.
Performance in new clinical environments, imaging protocols, or patient populations may differ from the results reported in the associated study.
Users are responsible for conducting appropriate validation before any downstream research or translational use.

Ethical Considerations

Users should ensure that all dental images and associated data are collected, processed, stored, and used in compliance with applicable privacy, consent, institutional review, and data protection requirements.

Users should not use DentVLM to attempt to identify, re-identify, or infer sensitive information about any individual. The model should not be used for automated clinical decision-making without appropriate validation, professional oversight, and regulatory approval.

License

DentVLM model weights

The DentVLM fine-tuned model weights and DentVLM-specific model release materials are licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0), unless otherwise stated.

Under CC BY-NC 4.0, the released model weights may be used, shared, and adapted for non-commercial purposes, provided that appropriate attribution is given and license terms are followed.

Source code

The DentVLM source code, including training, inference, and evaluation scripts, is released in the GitHub repository under the Apache License 2.0, unless otherwise stated in individual files.

Upstream components

DentVLM is built on Qwen/Qwen2-VL-7B-Instruct. Qwen2-VL-7B-Instruct is released by Alibaba Cloud under the Apache License 2.0. Third-party components, including Qwen2-VL, LLaMA-Factory, vLLM, and their associated files, remain subject to their original licenses and notices.

This model release does not grant rights to use any third-party trademarks or protected clinical data.

Citation

If you use DentVLM, please cite the associated manuscript and this model repository:

@article{meng2025dentvlm,
  title={Dentvlm: A multimodal vision-language model for comprehensive dental diagnosis and enhanced clinical practice},
  author={Meng, Zijie and Hao, Jin and Dai, Xiwei and Feng, Yang and Liu, Jiaxiang and Feng, Bin and Wu, Huikai and Gai, Xiaotang and Zhu, Hengchuan and Hu, Tianxiang and others},
  journal={arXiv preprint arXiv:2509.23344},
  year={2025}
}