Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning Paper • 2606.04923 • Published 2 days ago • 35
Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders Paper • 2605.27354 • Published 10 days ago • 15