Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning Paper • 2606.15007 • Published 7 days ago • 13
MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling Paper • 2606.13473 • Published 8 days ago • 89
ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research Paper • 2606.07591 • Published 22 days ago • 90
Toward Generalist Autonomous Research via Hypothesis-Tree Refinement Paper • 2606.11926 • Published 9 days ago • 111
SWE-Explore: Benchmarking How Coding Agents Explore Repositories Paper • 2606.07297 • Published 14 days ago • 115
Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models Paper • 2606.03988 • Published 16 days ago • 121
VibeThinker-3B: Exploring the Frontier of Verifiable Reasoning in Small Language Models Paper • 2606.16140 • Published 4 days ago • 94
GRAM-R^2: Self-Training Generative Foundation Reward Models for Reward Reasoning Paper • 2509.02492 • Published Sep 2, 2025 • 2
vectara/hallucination_evaluation_model Text Classification • 0.1B • Updated Oct 20, 2025 • 83.7k • 355
Nudging Beyond the Comfort Zone: Efficient Strategy-Guided Exploration for RLVR Paper • 2605.15726 • Published May 15 • 34