CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents Paper • 2605.25624 • Published 10 days ago • 32
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe Paper • 2604.13016 • Published Apr 14 • 109
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning Paper • 2505.17667 • Published May 23, 2025 • 89
IHEval: Evaluating Language Models on Following the Instruction Hierarchy Paper • 2502.08745 • Published Feb 12, 2025 • 20
bigheiniuJ/zephyr-7b-dpo-full-prompt-extend-chosen-delete_0.5 Text Generation • 7B • Updated Dec 27, 2024 • 2
bigheiniuJ/zephyr-7b-dpo-full-prompt-extend-chosen-delete_0.9 Text Generation • 7B • Updated Dec 26, 2024 • 2
bigheiniuJ/zephyr-7b-dpo-full-prompt-extend-chosen-delete_0.1 Text Generation • 7B • Updated Dec 26, 2024 • 1
bigheiniuJ/zephyr-7b-dpo-full-prompt-extend-chosen-norandom Text Generation • 7B • Updated Dec 24, 2024 • 2
bigheiniuJ/zephyr-7b-dpo-full-prompt-extend-chosen-noshort Text Generation • 7B • Updated Dec 24, 2024 • 2