Loracle CISPO v7

Interpreter LoRA for verbalizing behaviors encoded in fine-tuning weight deltas. Trained via offline CISPO (MiniMax-M1, arXiv:2506.13585) with Dr. GRPO advantages (arXiv:2503.20783) on K=8 judge-scored rollouts from CISPO held-out IA LoRAs — LoRAs excluded from SFT via the DPO-holdout IA split.

Eval

Set pass@N rollout-mean
AuditBench (56 orgs) 73.2% 47.9%
heldout_ia_v2 (20 orgs) 80.0% 73.3%
ood_models_v3 (23 orgs) 47.8% 12.8%

Training

  • Loss: CISPO (paper Eq. 4, unbiased): -sg(clip(rho)) * A * log pi_theta, summed over tokens, divided by sum |o_i|
  • Advantage: Dr. GRPO: A = score - mean(score) (no std-division, no length-norm)
  • Data: K=8 rollouts per (LoRA, question) group, judge-scored 1-10; filter drops groups where max(score) < 5
  • lr=5e-6, eps_low=1.0 (no lower clip, per paper), eps_high=2.0 (max ratio = 3.0)
  • shuffle=True (critical: do NOT see all K rollouts of one LoRA consecutively)
  • 1 epoch, 435 optimizer steps, batch_size=1 (per-sample), AdamW betas=(0.9, 0.95)
  • Base: Qwen/Qwen3-14B (instruct), interpreter rank=256, alpha=32, all 7 mag7 modules
  • Reference: frozen SFT checkpoint (loracle_k16_uber_v3_sft), per-token ref log-probs precomputed once

Loading

Feed direction tokens (shape [4480, 5120], svd_fixed_k16_mag7_rankfirst format, bf16) through the residual AOEncoder, inject at layer-1 output at placeholder positions, apply this interpreter over frozen Qwen/Qwen3-14B, decode greedily.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ceselder/loracle-cispo-v7

Finetuned
Qwen/Qwen3-14B
Adapter
(373)
this model

Papers for ceselder/loracle-cispo-v7