Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients Paper • 2606.18216 • Published 3 days ago • 48
LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling Paper • 2606.18023 • Published 3 days ago • 132
HarnessX: A Composable, Adaptive, and Evolvable Agent Harness Foundry Paper • 2606.14249 • Published 7 days ago • 41
Ling and Ring 2.6 Technical Report: Efficient and Instant Agentic Intelligence at Trillion-Parameter Scale Paper • 2606.15079 • Published 6 days ago • 73
Tangram: Unlocking Non-Uniform KV Cache Compression for Efficient Multi-turn LLM Serving Paper • 2606.06302 • Published 4 days ago • 9
GD^2PO: Mitigating Multi-Reward Conflicts via Group-Dynamic reward-Decoupled Policy Optimization Paper • 2606.16771 • Published 4 days ago • 10
TokenPilot: Cache-Efficient Context Management for LLM Agents Paper • 2606.17016 • Published 4 days ago • 14
FastContext: Training Efficient Repository Explorer for Coding Agents Paper • 2606.14066 • Published 7 days ago • 81
VibeThinker-3B: Exploring the Frontier of Verifiable Reasoning in Small Language Models Paper • 2606.16140 • Published 4 days ago • 92
InCoder-32B-Thinking: Industrial Code World Model for Thinking Paper • 2604.03144 • Published Apr 3 • 237
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published Apr 2 • 507
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning Paper • 2604.02721 • Published Apr 3 • 633
SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents Paper • 2604.17308 • Published Apr 19 • 23
ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration Paper • 2605.03042 • Published May 4 • 134
MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling Paper • 2606.13473 • Published 8 days ago • 89