Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs Paper • 2605.30611 • Published 9 days ago • 189
CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence Paper • 2605.12882 • Published 24 days ago • 270
FutureSim: Replaying World Events to Evaluate Adaptive Agents Paper • 2605.15188 • Published 23 days ago • 7
TabEmbed: Benchmarking and Learning Generalist Embeddings for Tabular Understanding Paper • 2605.04962 • Published about 1 month ago • 8
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents Paper • 2604.07429 • Published Apr 8 • 121
Omni123: Exploring 3D Native Foundation Models with Limited 3D Data by Unifying Text to 2D and 3D Generation Paper • 2604.02289 • Published Apr 2 • 13
SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise Paper • 2602.12783 • Published Feb 13 • 246