Instructions to use tencent/HunyuanOCR with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use tencent/HunyuanOCR with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="tencent/HunyuanOCR")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModelForSeq2SeqLM
model = AutoModelForSeq2SeqLM.from_pretrained("tencent/HunyuanOCR", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use tencent/HunyuanOCR with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "tencent/HunyuanOCR"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tencent/HunyuanOCR",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/tencent/HunyuanOCR

SGLang

How to use tencent/HunyuanOCR with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "tencent/HunyuanOCR" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tencent/HunyuanOCR",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "tencent/HunyuanOCR" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tencent/HunyuanOCR",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use tencent/HunyuanOCR with Docker Model Runner:
```
docker model run hf.co/tencent/HunyuanOCR
```

Monkeypatch for error only one element tensors can be converted to Python scalars

#10

by lastmass - opened Nov 26, 2025

Discussion

lastmass

Nov 26, 2025

add Monkeypatch for ValueError: only one element tensors can be converted to Python scalars

# --- Monkeypatch Start ---
def _preprocess_patched(
    self,
    images,
    videos=None,
    do_resize=None,
    size=None,
    min_pixels=None,
    max_pixels=None,
    resample=None,
    do_rescale=None,
    rescale_factor=None,
    do_normalize=None,
    image_mean=None,
    image_std=None,
    patch_size=None,
    temporal_patch_size=None,
    merge_size=None,
    do_convert_rgb=None,
    return_tensors=None,
    data_format=None,
    input_data_format=None,
):
    # Imports from the module
    smart_resize = image_processing_hunyuan_vl.smart_resize
    make_list_of_images = image_processing_hunyuan_vl.make_list_of_images
    convert_to_rgb = image_processing_hunyuan_vl.convert_to_rgb

    images = make_list_of_images(images)

    if do_convert_rgb:
        images = [convert_to_rgb(image) for image in images]

    width, height = images[0].width, images[0].height
    resized_width, resized_height = width, height
    processed_images = []
    for image in images:
        if do_resize:
            resized_width, resized_height = smart_resize(
                width,
                height,
                factor=patch_size * merge_size,
                min_pixels=size["shortest_edge"],
                max_pixels=size["longest_edge"],
            )
            image = image.resize((resized_width, resized_height))

        if do_normalize:
            image = transforms.Compose([
                transforms.ToTensor(),
                transforms.Normalize(self.image_mean, self.image_std)
            ])(image)
        processed_images.append(image)

    # FIX: Convert tensors to numpy arrays before creating the main array
    # Check if elements are tensors and convert if so
    if processed_images and isinstance(processed_images[0], torch.Tensor):
        patches = np.array([img.numpy() for img in processed_images])
    else:
        patches = np.array(processed_images)
    
    channel = patches.shape[1]
    grid_t = patches.shape[0] // temporal_patch_size
    grid_h, grid_w = resized_height // patch_size, resized_width // patch_size
    patches = patches.reshape(
        1,
        channel,
        grid_h // merge_size,
        merge_size,
        patch_size,
        grid_w // merge_size,
        merge_size,
        patch_size,
    )
    patches = patches.transpose(0, 2, 3, 5, 6, 1, 4, 7)
    flatten_patches = patches.reshape( 1 * grid_h * grid_w, channel * patch_size * patch_size)

    return flatten_patches, (grid_t, grid_h, grid_w)

print("Applying monkeypatch to HunYuanVLImageProcessor._preprocess...")
image_processing_hunyuan_vl.HunYuanVLImageProcessor._preprocess = _preprocess_patched
# --- Monkeypatch End ---

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment