Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases Paper • 2605.27355 • Published 6 days ago • 2