Instructions to use nyu-visionx/siglip2_decoder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use nyu-visionx/siglip2_decoder with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-to-image", model="nyu-visionx/siglip2_decoder")# Load model directly from transformers import AutoImageProcessor, AutoModelForMultimodalLM processor = AutoImageProcessor.from_pretrained("nyu-visionx/siglip2_decoder") model = AutoModelForMultimodalLM.from_pretrained("nyu-visionx/siglip2_decoder") - Notebooks
- Google Colab
- Kaggle
higher resolution version siglip2_decoder
#3
by szlgallen - opened
Hi! Thank you for your great work!
I noticed that the current open-sourced version only supports 224×224 image generation. May I ask if there are any plans to release a higher-resolution SigLIP2 decoder, similar to the DINOv2-B_512 configuration in RAE?
Additionally, is there any approach that would allow the current model to directly support higher-resolution image decoding (e.g., 512×512)? One advantage of VAEs is that they typically do not require training separate models for different input resolutions, so I was wondering if a similar flexibility could be achieved here.