Ψ-Bench: Evaluating Persona-Sensitive Influencing in Persuasive Dialogues Paper • 2606.02754 • Published 3 days ago • 12
GPU Forecasters: Language Models as Selective Surrogates for Kernel Runtime Optimization Paper • 2605.31464 • Published 7 days ago • 2
Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data Paper • 2602.21320 • Published Feb 24 • 12
ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning Paper • 2512.05111 • Published Dec 4, 2025 • 50
ExpertQA: Expert-Curated Questions and Attributed Answers Paper • 2309.07852 • Published Sep 14, 2023 • 2
DebugBench: Evaluating Debugging Capability of Large Language Models Paper • 2401.04621 • Published Jan 9, 2024 • 2
Meta Llama 3 Collection This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Dec 6, 2024 • 968
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models Paper • 2404.07839 • Published Apr 11, 2024 • 49