HodgeCover: Higher-Order Topological Coverage Drives Compression of Sparse Mixture-of-Experts Paper • 2605.13997 • Published 11 days ago • 5
Look Before You Leap: Autonomous Exploration for LLM Agents Paper • 2605.16143 • Published 9 days ago • 9
Learning from Failures: Correction-Oriented Policy Optimization with Verifiable Rewards Paper • 2605.14539 • Published 10 days ago • 5
Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation Paper • 2605.11739 • Published 11 days ago • 55
Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding Paper • 2605.02290 • Published 20 days ago • 39
Nudging Beyond the Comfort Zone: Efficient Strategy-Guided Exploration for RLVR Paper • 2605.15726 • Published 9 days ago • 32
SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization Paper • 2604.02268 • Published Apr 2 • 101