"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models Paper • 2308.03825 • Published Aug 7, 2023 • 2
view article Article Introducing the Red-Teaming Resistance Leaderboard +2 steve-sli, richard2, leonardtang, clefourrier • Feb 23, 2024 • 13
view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) +2 natolambert, LouisCastricato, lvwerra, Dahoas • Dec 9, 2022 • 412
Awesome RLHF Collection A curated collection of datasets, models, Spaces, and papers on Reinforcement Learning from Human Feedback (RLHF). • 11 items • Updated Oct 2, 2023 • 7