ChildVox: A Speech, Audio, and Large Audio-Language Model Benchmark in Understanding and Characterizing Sound across Childhood
Paper • 2605.29257 • Published • 4
None defined yet.
Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning
Precise Debugging Benchmark: Is Your Model Debugging or Regenerating?