AI translation of literary texts is "fine", but readers still prefer human translations Paper • 2606.26040 • Published 9 days ago • 3
Are Performance-Optimization Benchmarks Reliably Measuring Coding Agents? Paper • 2607.01211 • Published 2 days ago • 4
GenericAgent: A Token-Efficient Self-Evolving LLM Agent via Contextual Information Density Maximization (V1.0) Paper • 2604.17091 • Published Apr 18 • 23
Agent READMEs: An Empirical Study of Context Files for Agentic Coding Paper • 2511.12884 • Published Nov 17, 2025 • 29
AtomiMed: Hierarchical Atomic Fact-Checking for Universal Clinical-Aware Medical Report Evaluation Paper • 2606.31292 • Published 3 days ago • 4
Cross-Domain Generalization Failure in Lightweight Intrusion Detection Models for IIoT Networks Paper • 2607.00553 • Published 2 days ago • 5
CausalMix: Data Mixture as Causal Inference for Language Model Training Paper • 2607.01104 • Published 2 days ago • 14
PerceptionRubrics: Calibrating Multimodal Evaluation to Human Perception Paper • 2606.28322 • Published 7 days ago • 35
RepoRescue: An Empirical Study of LLM Agents on Whole-Repository Compatibility Rescue Paper • 2607.01213 • Published 2 days ago • 2
TRIAGE: Role-Typed Credit Assignment for Agentic Reinforcement Learning Paper • 2606.32017 • Published 3 days ago • 7
SWE-INTERACT: Reimagining SWE Benchmarks as User-Driven Long-Horizon Coding Sessions Paper • 2606.30573 • Published 4 days ago • 4
Graph-Native Reinforcement Learning Enables Traceable Scientific Hypothesis Generation through Conceptual Recombination Paper • 2607.00924 • Published 2 days ago • 4
AutoTrainess: Teaching Language Models to Improve Language Models Autonomously Paper • 2606.31551 • Published 3 days ago • 8
When LLMs Read Tables Carelessly: Measuring and Reducing Data Referencing Errors Paper • 2606.32029 • Published 3 days ago • 4
Autonomous Scientific Discovery via Iterative Meta-Reflection Paper • 2607.01131 • Published 2 days ago • 3
ELDR: Expert-Locality-Aware Decode Routing for PD-Disaggregated MoE Serving Paper • 2607.00466 • Published 2 days ago • 19
MemSyco-Bench: Benchmarking Sycophancy in Agent Memory Paper • 2607.01071 • Published 2 days ago • 20
Are We Measuring Strategy or Phrasing? The Gap Between Surface- and Approach-Level Diversity in LLM Math Reasoning Paper • 2606.29985 • Published 4 days ago • 16