Image resizing for vision encoder onnx export

#75

by Jrd100 - opened Jul 2, 2025

Hi,

We're trying to export Moondream's vision encoder to ONNX but running into a shape mismatch in the patch_emb layer:

RuntimeError: mat1 and mat2 shapes cannot be multiplied (672x224 and 588x1152).

We already tried (378,378) and (384,384), we get the similar error.

Code snippet:

transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5

Can you please clarify:

The correct input image size for the vision encoder?

The patch size used?

The expected flattened patch dimension for patch_emb?

Any required preprocessing steps we might be missing?

This will help us align our preprocessing with the model's architecture.

Thanks

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment