Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models Paper • 2606.11025 • Published 25 days ago • 41
ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents Paper • 2604.23781 • Published Apr 26 • 33
Chasing the Public Score: User Pressure and Evaluation Exploitation in Coding Agent Workflows Paper • 2604.20200 • Published Apr 22 • 5
Secure Code Generation via Online Reinforcement Learning with Vulnerability Reward Model Paper • 2602.07422 • Published Feb 7 • 22
TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows Paper • 2512.05150 • Published Dec 3, 2025 • 77
Reasoning with Confidence: Efficient Verification of LLM Reasoning Steps via Uncertainty Heads Paper • 2511.06209 • Published Nov 9, 2025 • 20
MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use Paper • 2509.24002 • Published Sep 28, 2025 • 180