Solvita: Enhancing Large Language Models for Competitive Programming via Agentic Evolution Paper • 2605.15301 • Published 6 days ago • 19
DexJoCo: A Benchmark and Toolkit for Task-Oriented Dexterous Manipulation on MuJoCo Paper • 2605.16257 • Published 5 days ago • 48
DR^{3}-Eval: Towards Realistic and Reproducible Deep Research Evaluation Paper • 2604.14683 • Published Apr 16 • 36
FeatureBench: Benchmarking Agentic Coding for Complex Feature Development Paper • 2602.10975 • Published Feb 11 • 18
OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models Paper • 2602.04804 • Published Feb 4 • 50
Vibe AIGC: A New Paradigm for Content Generation via Agentic Orchestration Paper • 2602.04575 • Published Feb 4 • 17
T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video Generation Paper • 2512.21094 • Published Dec 24, 2025 • 25