gemma-26b-firefly-reasoning
An experimental, security-tuned Gemma model. While it excels at one-shot classification, its performance regresses in multi-step tasks.
Training recipe
- Base:
mlx-community/gemma-4-26b-a4b-it-bf16(30 layers, 128 experts × top-k 8, moe_intermediate_size 704) - Corpus:
trevon/lowlevel-security→sft/multitask_v12_train.jsonl(10,177 conversational rows, 30.6% clean / 69.4% vuln) - LoRA: r=8, α=2, targets
q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj. The MoE expert SwitchLinear targets are wrapped with a per-expert LoRA ((num_experts, in, rank)and(num_experts, rank, out)tensors gathered viamx.gather_mm). - Trainable: 333.7 M params (3.78% of model), dominated by per-expert MoE deltas.
- LR: 1e-4 with declared warmup 40 + cosine decay (note: due to a known mlx-vlm grad-accum × LR-schedule bug, the schedule was effectively constant — same as v9 firefly, so direct comparable).
- 400 iters, batch size 1, grad accum 4, train-on-completions, assistant_id 77091.
- Selected step: 150 (best of 16 saved checkpoints by single-gate voting eval).
- Downloads last month
- 96
Model tree for trevon/gemma-26b-firefly
Base model
mlx-community/gemma-4-26b-a4b-it-bf16