Instructions to use callgg/fastvlm-caption with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use callgg/fastvlm-caption with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("callgg/fastvlm-caption", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
| { | |
| "crop_size": { | |
| "height": 1024, | |
| "width": 1024 | |
| }, | |
| "do_center_crop": true, | |
| "do_convert_rgb": true, | |
| "do_normalize": true, | |
| "do_rescale": true, | |
| "do_resize": true, | |
| "image_mean": [ | |
| 0.0, | |
| 0.0, | |
| 0.0 | |
| ], | |
| "image_processor_type": "CLIPImageProcessor", | |
| "image_std": [ | |
| 1.0, | |
| 1.0, | |
| 1.0 | |
| ], | |
| "processor_class": "LlavaProcessor", | |
| "resample": 3, | |
| "rescale_factor": 0.00392156862745098, | |
| "size": { | |
| "shortest_edge": 1024 | |
| } | |
| } | |