HakushoBench: A Japanese Chart and Table VQA Benchmark from Governmental White Papers Paper • 2606.01132 • Published 19 days ago • 6
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding Paper • 2605.29707 • Published 22 days ago • 146
OR-Space: A Full-Lifecycle Workspace Benchmark for Industrial Optimization Agents Paper • 2605.28158 • Published 23 days ago • 6
TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation Paper • 2605.22355 • Published 29 days ago • 178
OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization Paper • 2605.17757 • Published May 18 • 64
CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence Paper • 2605.12882 • Published May 13 • 272
BOOKMARKS: Efficient Active Storyline Memory for Role-playing Paper • 2605.14169 • Published May 13 • 8
Balanced Aggregation: Understanding and Fixing Aggregation Bias in GRPO Paper • 2605.04077 • Published Apr 14 • 7
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents Paper • 2604.07429 • Published Apr 8 • 122
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published Apr 2 • 507
On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers Paper • 2603.28762 • Published Mar 30 • 25
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization Paper • 2603.19835 • Published Mar 20 • 352