Qwen 3 4B RLM RLVR Collection LoRA adapters, full fine-tuned checkpoints, and SFT warmup models trained with RLVR in the recursive language model depth-1 harness. • 12 items • Updated 17 days ago
lsteno/Qwen3-4B-Instruct-2507-RLM-RLVR-FullFT-lr5e-6-depth1-v1 Text Generation • 4B • Updated 19 days ago • 139
Qwen 3 4B RLM RLVR Collection LoRA adapters, full fine-tuned checkpoints, and SFT warmup models trained with RLVR in the recursive language model depth-1 harness. • 12 items • Updated 17 days ago