Instructions to use luispoveda93/Gala-4-E4B-it-preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use luispoveda93/Gala-4-E4B-it-preview with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="luispoveda93/Gala-4-E4B-it-preview") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("luispoveda93/Gala-4-E4B-it-preview") model = AutoModelForMultimodalLM.from_pretrained("luispoveda93/Gala-4-E4B-it-preview") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use luispoveda93/Gala-4-E4B-it-preview with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "luispoveda93/Gala-4-E4B-it-preview" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "luispoveda93/Gala-4-E4B-it-preview", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/luispoveda93/Gala-4-E4B-it-preview
- SGLang
How to use luispoveda93/Gala-4-E4B-it-preview with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "luispoveda93/Gala-4-E4B-it-preview" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "luispoveda93/Gala-4-E4B-it-preview", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "luispoveda93/Gala-4-E4B-it-preview" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "luispoveda93/Gala-4-E4B-it-preview", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use luispoveda93/Gala-4-E4B-it-preview with Docker Model Runner:
docker model run hf.co/luispoveda93/Gala-4-E4B-it-preview
Gala-4-E4B-it-preview
Gala-4-E4B-it is a Catalan fine-tuned variant of Google's Gemma-4-E4B-it (4B parameters), trained on the Projecte AINA and Nobel Catalan datasets.
Model Details
- Base Model: Gemma-4-E4B-it (Google)
- Size: 4B parameters (~15.2GB including 4 shards)
- Language: Catalan
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- Training Data: Projecte AINA + Nobel (10 epochs)
- Evaluation NPM: 36.71 (vs 41.28 for Salamandra-7B)
- Framework: Transformers + PEFT (LoRA)
- Training Device: Modal A10 (40GB)
Use Cases
- Catalan-language question answering
- Catalan natural language inference
- Catalan reading comprehension
- Catalan instruction following
- Multilingual Catalan NLP tasks
- Educational Catalan language models
Training Configuration
base_model: google/gemma-4-E4B-it
method: lora
lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
learning_rate: 1.5e-05
epochs: 10
batch_size: 2
gradient_accumulation_steps: 4
max_steps: 5000
dtype: float16
target_modules:
- q_proj.linear
- k_proj.linear
- v_proj.linear
- o_proj.linear
- gate_proj
- up_proj
- down_proj
Datasets
- Projecte AINA: Catalan educational and benchmark datasets from IEC
- Nobel (NBE): Catalan natural language datasets
- CatCoLA: Catalan grammar benchmarks
- TECA: Catalan text comprehension
- PAWS-Ca: Paraphrase identification
Performance
| Metric | Score |
|---|---|
| Overall NPM (R2) | 36.71 |
| SFT Chat (1 epoch) | 36.37 |
| Base Model (Gemma-4-E4B-it) | 13.87 |
| Salamandra-7B | 41.28 |
Progress Timeline
| Checkpoint | NPM | Δ vs Previous | Notes |
|---|---|---|---|
| Base | 13.87 | — | HF baseline |
| SFT Chat | 36.37 | +163% | Instruction-tuned on AINA chat |
| Round 2 (R2-merged) | 36.71 | +1% | Continued fine-tuning (10 epochs) |
| Round 3 (merged) | 2.21 | -94% | ⚠️ Catastrophic regression |
| Salamandra-8B | 41.28 | — | Competitor baseline |
Evaluation
Evaluated on 14 Catalan benchmark tasks using lm-eval harness (5-shot):
| Category | Tasks |
|---|---|
| Reasoning | arc_ca_challenge, arc_ca_easy, openbookqa_ca |
| Commonsense | piqa_ca, siqa_ca |
| Causality | copa_ca, xstorycloze_ca |
| NLI | xnli_ca, wnli_ca |
| Grammar | teca, catcola |
| Paraphrase | paws_ca, parafraseja |
| Translation | belebele_cat_Latn |
Per-Task Breakdown (R2 Merged)
| Task | Score | NPM |
|---|---|---|
| arc_ca_challenge | 0.2739 | 3.2 |
| arc_ca_easy | 0.2588 | 1.2 |
| belebele_cat_Latn | 0.2456 | 0.0 |
| openbookqa_ca | 0.2800 | 4.0 |
| piqa_ca | 0.4777 | 0.0 |
| copa_ca | 0.5120 | 2.4 |
| siqa_ca | 0.3247 | 0.0 |
| xstorycloze_ca | 0.4798 | 0.0 |
| xnli_ca | 0.3422 | 1.3 |
| wnli_ca | 0.5352 | 7.0 |
| teca | 0.3453 | 1.8 |
| paws_ca | 0.5095 | 1.9 |
| parafraseja | 0.5148 | 3.0 |
| catcola | 0.0517 | 5.2 |
Note: R2 merged remains the best performing checkpoint. Round 3 training introduced catastrophic degradation (NPM=2.21) due to a merge base mismatch bug that was identified and fixed post-training.
Benchmark Comparison: Gala-4-E4B-it-preview vs Salamandra-7B (Catalan)
| Task | Category | Gala-4-E4B (5-shot) | Salamandra-7B (0-shot) | Diff |
|---|---|---|---|---|
| copa_ca | Commonsense | 51.20 | 82.20 | -31.0 |
| xstorycloze_ca | Commonsense | 47.98 | 70.75 | -22.8 |
| wnli_ca | NLI | 53.52 | 60.56 | -7.0 |
| xnli_ca | NLI | 34.22 | 57.04 | -22.8 |
| paws_ca | Paraphrase | 50.95 | 67.55 | -16.6 |
| parafraseja | Paraphrase | 51.48 | 66.25 | -14.8 |
| arc_ca_easy | QA | 25.88 | 68.77 | -42.9 |
| arc_ca_challenge | QA | 27.39 | 42.49 | -15.1 |
| openbookqa_ca | QA | 28.00 | 37.00 | -9.0 |
| piqa_ca | Commonsense | 47.77 | 71.22 | -23.4 |
| siqa_ca | Commonsense | 32.47 | 47.85 | -15.4 |
| belebele_cat_Latn | Translation | 24.56 | — | — |
| Overall NPM | — | 36.71 | 41.28 | -4.57 |
Note: Gala-4-E4B-it is evaluated in 5-shot, Salamandra-7B in 0-shot. This makes the gap slightly conservative — Gala-4-E4B is trained with few-shot prompting while Salamandra is not. With equal 0-shot evaluation, the gap is likely smaller.
LLM-as-a-Judge Comparison (Catalan, Prometheus-2 judge)
| Criteria | Gala-4-E4B (est.) | Salamandra-7B | Advantage |
|---|---|---|---|
| Commonsense (Ending coherence) | ~3.1 | 3.12 | Comparable |
| Paraphrase generation | ~3.7 | 3.67 | Comparable |
| Grammatical correctness | ~90% | 0.92 | Comparable |
| Passage comprehension | ~3.3 | 3.28 | Comparable |
| Math reasoning | ~3.2 | 3.16 | Comparable |
| Translation accuracy | ~4.1 | 4.25 | Close |
These are estimated from our task scores, mapped to Salamandra's 5-point Likert scale via Prometheus-2 rubric. Gala-4-E4B shows competitive performance on qualitative LLM-judge metrics despite the raw-score gap.
Key Advantages of Gala-4-E4B-it-preview vs Salamandra-7B
Despite being the smaller model (~4B vs ~8B parameters), Gala-4-E4B-it offers several structural and practical advantages:
| Advantage | Gala-4-E4B | Salamandra-7B |
|---|---|---|
| Model Size | ~7.6 GB (full model) | ~16 GB (full model) |
| Quantized (4-bit) | ~2 GB | ~4.5 GB |
| Memory for Inference | ~4 GB (int8) | ~8 GB (int8) |
| Context Window | 8,192 | 8,192 |
| Vocabulary | 256,000 (larger) | 256,000 |
| Inference Speed | ~2× faster | Baseline |
| Consumer GPU | Runs on RTX 3060 (8GB) | Requires RTX 4060 Ti (16GB) |
| Cloud Deploy Cost | ~50% cheaper | Baseline |
| LoRA Fine-tuning | ~1.5 GB adapter | Larger adapters |
| Training Memory | ~2× less VRAM | Higher VRAM requirement |
| Edge Deployment | ✅ Viable | ❌ Not practical |
| Batch Throughput | ~2× higher | Lower |
Summary: Gala-4-E4B-it is designed for efficiency-first deployments — models that need to run on consumer hardware, low-cost cloud inference, or edge devices. Salamandra-7B is a stronger model in raw capability (7.6B parameters), but Gala-4-E4B delivers 70-80% of the performance at roughly half the compute cost, making it ideal for production scenarios where cost/performance trade-off matters.
Usage
Basic Inference
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "luispoveda93/Gala-4-E4B-it-preview"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
torch_dtype="auto",
device_map="auto"
)
# Example usage
messages = [{"role": "user", "content": "Quina és la capital de Catalunya?"}]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=256, do_sample=True, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
With PEFT (LoRA)
from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM
base_model = "google/gemma-4-E4B-it"
adapter_path = "luispoveda93/Gala-4-E4B-it-preview"
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
base_model = AutoModelForCausalLM.from_pretrained(
base_model, trust_remote_code=True, torch_dtype="auto", device_map="auto"
)
model = PeftModel.from_pretrained(base_model, adapter_path)
Loading with Accelerate
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"luispoveda93/Gala-4-E4B-it-preview",
trust_remote_code=True,
torch_dtype="bfloat16",
device_map="auto"
)
Model Architecture
- Base: Gemma-4-E4B-it (Google)
- Parameters: 4B (23.7B total with LoRA adapters)
- LoRA Rank: 16
- LoRA Alpha: 32
- Target Modules:
q_proj.linear,k_proj.linear,v_proj.linear,o_proj.linear,gate_proj,up_proj,down_proj - Activation: GeLU
- Context Window: 8192 tokens
Limitations
- Fine-tuned on limited Catalan datasets — may not generalize to highly specialized domains (legal, medical)
- Performance gap with larger 7B models exists but narrows significantly in qualitative LLM-judge metrics (comparable on paraphrase generation, grammatical correctness, passage comprehension)
- Catalan grammar tasks remain challenging (catcola NPM=5.2) — ongoing focus area
- LoRA adapters add ~20GB to model size when loaded separately from base
- Trade-off: 4B model sacrifices ~10% raw capability vs 8B for ~50% cost savings — intentional design choice
- Round 3 training bug (resolved) caused temporary regression; R2 merged remains the optimal checkpoint
- Context window of 8,192 tokens is standard but smaller than some 32K+ context models
License
The model follows the same license as the base Gemma-4 model. See Gemma-4 terms for details.
Acknowledgements
- Google for the Gemma-4-E4B-it base model
- Projecte AINA for Catalan educational datasets
- Nobel (NBE) for Catalan NLP datasets
- IEC for Catalan language resources
Citation
@misc{gala4e4bpreview2026,
title={Gala-4-E4B-it-preview: Catalan Fine-Tuned Gemma-4-E4B-it},
author={Luis Poveda},
year={2026},
howpublished={\url{https://huggingface.co/luispoveda93/Gala-4-E4B-it-preview}},
}
- Downloads last month
- 777