HakushoBench: A Japanese Chart and Table VQA Benchmark from Governmental White Papers Paper • 2606.01132 • Published 6 days ago • 6
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding Paper • 2605.29707 • Published 9 days ago • 139
OR-Space: A Full-Lifecycle Workspace Benchmark for Industrial Optimization Agents Paper • 2605.28158 • Published 10 days ago • 6
TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation Paper • 2605.22355 • Published 16 days ago • 177
OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization Paper • 2605.17757 • Published 19 days ago • 64
CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence Paper • 2605.12882 • Published 24 days ago • 270
BOOKMARKS: Efficient Active Storyline Memory for Role-playing Paper • 2605.14169 • Published 24 days ago • 8
Balanced Aggregation: Understanding and Fixing Aggregation Bias in GRPO Paper • 2605.04077 • Published Apr 14 • 7