higher resolution version siglip2_decoder

by szlgallen - opened Mar 24

Mar 24

Hi! Thank you for your great work!

I noticed that the current open-sourced version only supports 224×224 image generation. May I ask if there are any plans to release a higher-resolution SigLIP2 decoder, similar to the DINOv2-B_512 configuration in RAE?

Additionally, is there any approach that would allow the current model to directly support higher-resolution image decoding (e.g., 512×512)? One advantage of VAEs is that they typically do not require training separate models for different input resolutions, so I was wondering if a similar flexibility could be achieved here.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment