Instructions to use boopathiraj/gemma-3-instruct-small with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use boopathiraj/gemma-3-instruct-small with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="boopathiraj/gemma-3-instruct-small")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("boopathiraj/gemma-3-instruct-small")
model = AutoModelForCausalLM.from_pretrained("boopathiraj/gemma-3-instruct-small")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use boopathiraj/gemma-3-instruct-small with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "boopathiraj/gemma-3-instruct-small"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "boopathiraj/gemma-3-instruct-small",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/boopathiraj/gemma-3-instruct-small

SGLang

How to use boopathiraj/gemma-3-instruct-small with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "boopathiraj/gemma-3-instruct-small" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "boopathiraj/gemma-3-instruct-small",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "boopathiraj/gemma-3-instruct-small" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "boopathiraj/gemma-3-instruct-small",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use boopathiraj/gemma-3-instruct-small with Docker Model Runner:
```
docker model run hf.co/boopathiraj/gemma-3-instruct-small
```

Gemma-3 Instruct Small (LoRA Merged)

Model Summary

Gemma-3 Instruct Small is a lightweight instruction-following language model fine-tuned from Google’s Gemma-3-270M-IT using LoRA and later merged into the base model for efficient inference.

The model is optimized for:

Instruction following
Basic mathematical reasoning
Short-form question answering
Educational and experimental use

Model Details

Model Description

Developed by: Boopathiraj
Organization: Self (Independent)
Model type: Causal Language Model (Instruction-tuned)
Language(s): English
License: Apache 2.0
Finetuned from: google/gemma-3-270m-it

This model was trained using parameter-efficient fine-tuning (LoRA) and later merged into the base weights for standalone inference without PEFT dependencies.

Model Sources

Base Model: https://huggingface.co/google/gemma-3-270m-it
Repository: https://huggingface.co/boopathiraj/gemma-3-instruct-small

Uses

Direct Use

This model can be used directly for:

Instruction-based text generation
Simple math word problems
Educational demos
Lightweight inference on limited hardware

Example use cases:

Chatbots
Teaching assistants
Rapid prototyping

Downstream Use

The model may be further fine-tuned for:

Domain-specific Q&A
Educational datasets
Small-scale reasoning benchmarks

Out-of-Scope Use

This model is not intended for:

Medical, legal, or financial advice
High-stakes decision making
Safety-critical applications
Long-context reasoning

Bias, Risks, and Limitations

Inherits biases from the base Gemma model and training data
Limited reasoning depth due to small parameter count (270M)
May produce incorrect or hallucinated answers
Performance degrades on long or multi-step reasoning tasks

Recommendations

Users should:

Validate outputs before use
Avoid high-risk domains
Treat results as assistive, not authoritative

How to Get Started

Inference Example

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained(
    "boopathiraj/gemma-3-instruct-small",
    use_fast=False
)

model = AutoModelForCausalLM.from_pretrained(
    "boopathiraj/gemma-3-instruct-small",
    device_map="auto",
    dtype=torch.float16
)

model.eval()

prompt = "Solve the problem: What is 7 minus 3?"

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=32)

print(tokenizer.decode(output[0], skip_special_tokens=True))

Downloads last month: 3

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for boopathiraj/gemma-3-instruct-small

Base model

google/gemma-3-270m

Finetuned

google/gemma-3-270m-it

Finetuned

(1095)

this model