You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

gemma-26b-firefly-reasoning

An experimental, security-tuned Gemma model. While it excels at one-shot classification, its performance regresses in multi-step tasks.

Base: mlx-community/gemma-4-26b-a4b-it-bf16 (30 layers, 128 experts × top-k 8, moe_intermediate_size 704)
Corpus: trevon/lowlevel-security → sft/multitask_v12_train.jsonl (10,177 conversational rows, 30.6% clean / 69.4% vuln)
LoRA: r=8, α=2, targets q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj. The MoE expert SwitchLinear targets are wrapped with a per-expert LoRA ((num_experts, in, rank) and (num_experts, rank, out) tensors gathered via mx.gather_mm).
Trainable: 333.7 M params (3.78% of model), dominated by per-expert MoE deltas.
LR: 1e-4 with declared warmup 40 + cosine decay (note: due to a known mlx-vlm grad-accum × LR-schedule bug, the schedule was effectively constant — same as v9 firefly, so direct comparable).
400 iters, batch size 1, grad accum 4, train-on-completions, assistant_id 77091.
Selected step: 150 (best of 16 saved checkpoints by single-gate voting eval).

Safetensors

Model size

26B params

Tensor type

BF16

Base model

Quantized

(1)

this model