Andrea Blasi's picture

3

Andrea Blasi

AndreaBlasi97

AI & ML interests

None yet

Recent Activity

upvoted a paper 1 day ago

PsychoSafe: Eliciting Psychologically-Informed Refusals in Large Language Models

upvoted a paper 1 day ago

BrainSurgery: Reproducible and Reliable Declarative Weight Manipulations for Model Editing and Upcycling

upvoted a paper 1 day ago

The Arbiter Agent: Continually Monitoring Multi-Agent Conversations to Detect Emergent Misalignment

View all activity

Organizations

None yet

upvoted 3 papers 1 day ago

PsychoSafe: Eliciting Psychologically-Informed Refusals in Large Language Models

Paper • 2606.09697 • Published 8 days ago • 7

BrainSurgery: Reproducible and Reliable Declarative Weight Manipulations for Model Editing and Upcycling

Paper • 2606.09707 • Published 8 days ago • 8

The Arbiter Agent: Continually Monitoring Multi-Agent Conversations to Detect Emergent Misalignment

Paper • 2606.10747 • Published 8 days ago • 11