Learning from Language Feedback via Variational Policy Distillation Paper • 2605.15113 • Published 8 days ago • 10
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information Paper • 2605.11609 • Published 14 days ago • 190
Omni-Persona: Systematic Benchmarking and Improving Omnimodal Personalization Paper • 2605.09996 • Published 15 days ago • 8
Odysseus: Scaling VLMs to 100+ Turn Decision-Making in Games via Reinforcement Learning Paper • 2605.00347 • Published 25 days ago • 16
RaV-IDP: A Reconstruction-as-Validation Framework for Faithful Intelligent Document Processing Paper • 2604.23644 • Published 30 days ago • 5
MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models Paper • 2511.10262 • Published Apr 17 • 2
ACES: Who Tests the Tests? Leave-One-Out AUC Consistency for Code Generation Paper • 2604.03922 • Published Apr 5 • 53
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning Paper • 2604.02721 • Published Apr 3 • 630
CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence Paper • 2603.28032 • Published Mar 30 • 342
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models Paper • 2603.25716 • Published Mar 26 • 156