view article Article Simplifying Alignment: From RLHF to Direct Preference Optimization (DPO) Jan 19, 2025 โข 49
AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents Paper โข 2602.06855 โข Published Feb 6 โข 83