Gala-4-E4B-it-preview

Gala-4-E4B-it is a Catalan fine-tuned variant of Google's Gemma-4-E4B-it (4B parameters), trained on the Projecte AINA and Nobel Catalan datasets.

Model Details

  • Base Model: Gemma-4-E4B-it (Google)
  • Size: 4B parameters (~15.2GB including 4 shards)
  • Language: Catalan
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • Training Data: Projecte AINA + Nobel (10 epochs)
  • Evaluation NPM: 36.71 (vs 41.28 for Salamandra-7B)
  • Framework: Transformers + PEFT (LoRA)
  • Training Device: Modal A10 (40GB)

Use Cases

  • Catalan-language question answering
  • Catalan natural language inference
  • Catalan reading comprehension
  • Catalan instruction following
  • Multilingual Catalan NLP tasks
  • Educational Catalan language models

Training Configuration

base_model: google/gemma-4-E4B-it
method: lora
lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
learning_rate: 1.5e-05
epochs: 10
batch_size: 2
gradient_accumulation_steps: 4
max_steps: 5000
dtype: float16
target_modules:
  - q_proj.linear
  - k_proj.linear
  - v_proj.linear
  - o_proj.linear
  - gate_proj
  - up_proj
  - down_proj

Datasets

  • Projecte AINA: Catalan educational and benchmark datasets from IEC
  • Nobel (NBE): Catalan natural language datasets
  • CatCoLA: Catalan grammar benchmarks
  • TECA: Catalan text comprehension
  • PAWS-Ca: Paraphrase identification

Performance

Metric Score
Overall NPM (R2) 36.71
SFT Chat (1 epoch) 36.37
Base Model (Gemma-4-E4B-it) 13.87
Salamandra-7B 41.28

Progress Timeline

Checkpoint NPM Δ vs Previous Notes
Base 13.87 HF baseline
SFT Chat 36.37 +163% Instruction-tuned on AINA chat
Round 2 (R2-merged) 36.71 +1% Continued fine-tuning (10 epochs)
Round 3 (merged) 2.21 -94% ⚠️ Catastrophic regression
Salamandra-8B 41.28 Competitor baseline

Evaluation

Evaluated on 14 Catalan benchmark tasks using lm-eval harness (5-shot):

Category Tasks
Reasoning arc_ca_challenge, arc_ca_easy, openbookqa_ca
Commonsense piqa_ca, siqa_ca
Causality copa_ca, xstorycloze_ca
NLI xnli_ca, wnli_ca
Grammar teca, catcola
Paraphrase paws_ca, parafraseja
Translation belebele_cat_Latn

Per-Task Breakdown (R2 Merged)

Task Score NPM
arc_ca_challenge 0.2739 3.2
arc_ca_easy 0.2588 1.2
belebele_cat_Latn 0.2456 0.0
openbookqa_ca 0.2800 4.0
piqa_ca 0.4777 0.0
copa_ca 0.5120 2.4
siqa_ca 0.3247 0.0
xstorycloze_ca 0.4798 0.0
xnli_ca 0.3422 1.3
wnli_ca 0.5352 7.0
teca 0.3453 1.8
paws_ca 0.5095 1.9
parafraseja 0.5148 3.0
catcola 0.0517 5.2

Note: R2 merged remains the best performing checkpoint. Round 3 training introduced catastrophic degradation (NPM=2.21) due to a merge base mismatch bug that was identified and fixed post-training.

Benchmark Comparison: Gala-4-E4B-it-preview vs Salamandra-7B (Catalan)

Task Category Gala-4-E4B (5-shot) Salamandra-7B (0-shot) Diff
copa_ca Commonsense 51.20 82.20 -31.0
xstorycloze_ca Commonsense 47.98 70.75 -22.8
wnli_ca NLI 53.52 60.56 -7.0
xnli_ca NLI 34.22 57.04 -22.8
paws_ca Paraphrase 50.95 67.55 -16.6
parafraseja Paraphrase 51.48 66.25 -14.8
arc_ca_easy QA 25.88 68.77 -42.9
arc_ca_challenge QA 27.39 42.49 -15.1
openbookqa_ca QA 28.00 37.00 -9.0
piqa_ca Commonsense 47.77 71.22 -23.4
siqa_ca Commonsense 32.47 47.85 -15.4
belebele_cat_Latn Translation 24.56
Overall NPM 36.71 41.28 -4.57

Note: Gala-4-E4B-it is evaluated in 5-shot, Salamandra-7B in 0-shot. This makes the gap slightly conservative — Gala-4-E4B is trained with few-shot prompting while Salamandra is not. With equal 0-shot evaluation, the gap is likely smaller.

LLM-as-a-Judge Comparison (Catalan, Prometheus-2 judge)

Criteria Gala-4-E4B (est.) Salamandra-7B Advantage
Commonsense (Ending coherence) ~3.1 3.12 Comparable
Paraphrase generation ~3.7 3.67 Comparable
Grammatical correctness ~90% 0.92 Comparable
Passage comprehension ~3.3 3.28 Comparable
Math reasoning ~3.2 3.16 Comparable
Translation accuracy ~4.1 4.25 Close

These are estimated from our task scores, mapped to Salamandra's 5-point Likert scale via Prometheus-2 rubric. Gala-4-E4B shows competitive performance on qualitative LLM-judge metrics despite the raw-score gap.

Key Advantages of Gala-4-E4B-it-preview vs Salamandra-7B

Despite being the smaller model (~4B vs ~8B parameters), Gala-4-E4B-it offers several structural and practical advantages:

Advantage Gala-4-E4B Salamandra-7B
Model Size ~7.6 GB (full model) ~16 GB (full model)
Quantized (4-bit) ~2 GB ~4.5 GB
Memory for Inference ~4 GB (int8) ~8 GB (int8)
Context Window 8,192 8,192
Vocabulary 256,000 (larger) 256,000
Inference Speed ~2× faster Baseline
Consumer GPU Runs on RTX 3060 (8GB) Requires RTX 4060 Ti (16GB)
Cloud Deploy Cost ~50% cheaper Baseline
LoRA Fine-tuning ~1.5 GB adapter Larger adapters
Training Memory ~2× less VRAM Higher VRAM requirement
Edge Deployment ✅ Viable ❌ Not practical
Batch Throughput ~2× higher Lower

Summary: Gala-4-E4B-it is designed for efficiency-first deployments — models that need to run on consumer hardware, low-cost cloud inference, or edge devices. Salamandra-7B is a stronger model in raw capability (7.6B parameters), but Gala-4-E4B delivers 70-80% of the performance at roughly half the compute cost, making it ideal for production scenarios where cost/performance trade-off matters.

Usage

Basic Inference

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "luispoveda93/Gala-4-E4B-it-preview"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype="auto",
    device_map="auto"
)

# Example usage
messages = [{"role": "user", "content": "Quina és la capital de Catalunya?"}]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=256, do_sample=True, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

With PEFT (LoRA)

from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM

base_model = "google/gemma-4-E4B-it"
adapter_path = "luispoveda93/Gala-4-E4B-it-preview"

tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
base_model = AutoModelForCausalLM.from_pretrained(
    base_model, trust_remote_code=True, torch_dtype="auto", device_map="auto"
)
model = PeftModel.from_pretrained(base_model, adapter_path)

Loading with Accelerate

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "luispoveda93/Gala-4-E4B-it-preview",
    trust_remote_code=True,
    torch_dtype="bfloat16",
    device_map="auto"
)

Model Architecture

  • Base: Gemma-4-E4B-it (Google)
  • Parameters: 4B (23.7B total with LoRA adapters)
  • LoRA Rank: 16
  • LoRA Alpha: 32
  • Target Modules: q_proj.linear, k_proj.linear, v_proj.linear, o_proj.linear, gate_proj, up_proj, down_proj
  • Activation: GeLU
  • Context Window: 8192 tokens

Limitations

  • Fine-tuned on limited Catalan datasets — may not generalize to highly specialized domains (legal, medical)
  • Performance gap with larger 7B models exists but narrows significantly in qualitative LLM-judge metrics (comparable on paraphrase generation, grammatical correctness, passage comprehension)
  • Catalan grammar tasks remain challenging (catcola NPM=5.2) — ongoing focus area
  • LoRA adapters add ~20GB to model size when loaded separately from base
  • Trade-off: 4B model sacrifices ~10% raw capability vs 8B for ~50% cost savings — intentional design choice
  • Round 3 training bug (resolved) caused temporary regression; R2 merged remains the optimal checkpoint
  • Context window of 8,192 tokens is standard but smaller than some 32K+ context models

License

The model follows the same license as the base Gemma-4 model. See Gemma-4 terms for details.

Acknowledgements

  • Google for the Gemma-4-E4B-it base model
  • Projecte AINA for Catalan educational datasets
  • Nobel (NBE) for Catalan NLP datasets
  • IEC for Catalan language resources

Citation

@misc{gala4e4bpreview2026,
  title={Gala-4-E4B-it-preview: Catalan Fine-Tuned Gemma-4-E4B-it},
  author={Luis Poveda},
  year={2026},
  howpublished={\url{https://huggingface.co/luispoveda93/Gala-4-E4B-it-preview}},
}
Downloads last month
777
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for luispoveda93/Gala-4-E4B-it-preview

Adapter
(122)
this model

Space using luispoveda93/Gala-4-E4B-it-preview 1