AgenticSTS: A Bounded-Memory Testbed for Long-Horizon LLM Agents Paper • 2607.02255 • Published 2 days ago • 41
Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality? Paper • 2605.22109 • Published May 21 • 171
WorldMark: A Unified Benchmark Suite for Interactive Video World Models Paper • 2604.21686 • Published Apr 23 • 36