Instructions to use google/gemma-3-4b-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use google/gemma-3-4b-it with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="google/gemma-3-4b-it")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("google/gemma-3-4b-it")
model = AutoModelForImageTextToText.from_pretrained("google/gemma-3-4b-it")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use google/gemma-3-4b-it with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "google/gemma-3-4b-it"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-3-4b-it",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/google/gemma-3-4b-it

SGLang

How to use google/gemma-3-4b-it with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "google/gemma-3-4b-it" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-3-4b-it",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "google/gemma-3-4b-it" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-3-4b-it",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use google/gemma-3-4b-it with Docker Model Runner:
```
docker model run hf.co/google/gemma-3-4b-it
```

Batch processing on a GPU?

#32

by buckeye17-bah - opened Mar 28, 2025

Discussion

buckeye17-bah

Mar 28, 2025

•

edited Mar 28, 2025

I'm pretty new to the transformers package. Can anyone provide example code for how a Gemma 3 VLM can be used to batch process images on a CUDA GPU? In my case, I have a list of local files that I want to process using a common prompt. Currently I'm only able to process each image sequentially on my CUDA GPU.

Ayorinha

Mar 28, 2025

To process images in bulk using the Gemma 3 VLM model on a CUDA GPU, you can use PyTorch along with Tesseract OCR to extract text from the images and then send those texts to the model for inference. First, install the necessary libraries like torch, transformers, pytesseract, and Pillow. Then, load the model and tokenizer using transformers, and use Tesseract to process each image individually. To optimize batch processing, you can loop through all the images in a directory and generate text for each of them. The code below illustrates this process, using the GPU to perform the inferences:

python
Copiar
Editar

here !

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from PIL import Image
import pytesseract
import os

Define the path to the Tesseract OCR executable (if necessary)

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe" # Adjust as needed

Function to process an image with OCR

def process_image(image_path):
# Load the image
img = Image.open(image_path)

# Use Tesseract to extract text
text = pytesseract.image_to_string(img)

return text

Function to process the batch of images

def process_batch(image_paths, model, tokenizer, device):
texts = []

for image_path in image_paths:
    print(f"Processing {image_path}...")
    
    # Step 1: Process the image with OCR (convert image to text)
    ocr_text = process_image(image_path)
    
    # Step 2: Use the model for inference (based on the extracted text)
    inputs = tokenizer(ocr_text, return_tensors="pt").to(device)
    with torch.no_grad():
        outputs = model.generate(**inputs, max_length=1024)
    
    # Decode the model's response
    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    texts.append(generated_text)

return texts

Load the model and tokenizer

model_name = "gemma-3-4b-it" # Or any other model you have
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name).to('cuda')

Path to the folder with the images

image_folder = "/path/to/your/images"

List of image paths

image_paths = [os.path.join(image_folder, fname) for fname in os.listdir(image_folder) if fname.endswith('.jpg') or fname.endswith('.png')]

Process the images in bulk

generated_texts = process_batch(image_paths, model, tokenizer, 'cuda')

Display the results

for idx, generated_text in enumerate(generated_texts):
print(f"Generated text for image {image_paths[idx]}: {generated_text}\n")

buckeye17-bah

Mar 28, 2025

•

edited Mar 28, 2025

@Ayorinha thanks for replying. I'm guessing you asked an LLM my question and pasted the response? What you provided doesn't make sense. The code is using pytesseract to extract the text from my image then feeding the text into Gemma 3 without any prompt from me. It's treating Gemma 3 like an LLM rather than a VLM, and it doesn't provide any prompt. This is not how Gemma 3 is meant to be used. I should be feeding the image and my prompt into Gemma 3. My aim is to do visual question answering (VQA) of the images I have.

I should mention I have already consulted with Sonnet 3.7 on this question and it wasn't able to figure it out. Maybe a more experienced transformers user could coax the right answer out of it, but I couldn't.

Ayorinha

Mar 28, 2025

•

edited Mar 28, 2025

sorry man ,

Explain to me what you did

omg , This should work correctly for Visual Question Answering VQA ?

import torch
from transformers import AutoProcessor, AutoModelForVision2Seq
from PIL import Image
import os

BalakrishnaCh

Google org Oct 14, 2025

Hi,

Apologies for the late reply, thanks for reaching out to us. Could you please confirm whether the above mentioned issue is resolve or do you required any additional assistance.

Thanks.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment