Agentic Abstention: Do Agents Know When to Stop Instead of Act? Paper • 2606.28733 • Published 6 days ago • 138
SWE-Explore: Benchmarking How Coding Agents Explore Repositories Paper • 2606.07297 • Published 28 days ago • 122
view article Article Can Voice Agents Handle Bilingual Customers? Benchmarking Frontier ASR on Code-Switched Speech ServiceNow-AI • 23 days ago • 44
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models Paper • 2510.04618 • Published Oct 6, 2025 • 134
EVA-Bench: A New End-to-end Framework for Evaluating Voice Agents Paper • 2605.13841 • Published May 13 • 76
Do Enterprise Systems Need Learned World Models? The Importance of Context to Infer Dynamics Paper • 2605.12178 • Published May 12 • 65
nvidia/llama-nemotron-embed-vl-1b-v2 Sentence Similarity • 2B • Updated about 1 month ago • 128k • 89
view article Article Vision Language Models (Better, faster, stronger) +3 merve, sergiopaniego, ariG23498, pcuenq, andito • May 12, 2025 • 613
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published Apr 8 • 329
Apriel-Reasoner: RL Post-Training for General-Purpose and Efficient Reasoning Paper • 2604.02007 • Published Apr 2 • 14
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents Paper • 2603.24440 • Published Mar 25 • 99
view article Article A New Framework for Evaluating Voice Agents (EVA) ServiceNow-AI • Mar 24 • 95