Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up
Filippo Tonini's picture
1 4

Filippo Tonini

filo362
  • pippot
  • filippo-tonini-35b8a6283

AI & ML interests

LLM safety in multi-agent environments

Recent Activity

authored a paper 1 day ago
Emergent Languages in Populations of Language Model Agents: From Token Efficiency to Oversight Evasion
submitted a paper 1 day ago
The Arbiter Agent: Continually Monitoring Multi-Agent Conversations to Detect Emergent Misalignment
upvoted a paper 1 day ago
The Arbiter Agent: Continually Monitoring Multi-Agent Conversations to Detect Emergent Misalignment
View all activity

Organizations

None yet

upvoted a paper 1 day ago

The Arbiter Agent: Continually Monitoring Multi-Agent Conversations to Detect Emergent Misalignment

Paper • 2606.10747 • Published 8 days ago • 11
upvoted 2 papers 6 days ago

PsychoSafe: Eliciting Psychologically-Informed Refusals in Large Language Models

Paper • 2606.09697 • Published 8 days ago • 7

BrainSurgery: Reproducible and Reliable Declarative Weight Manipulations for Model Editing and Upcycling

Paper • 2606.09707 • Published 8 days ago • 8
upvoted a paper 20 days ago

Confidence and Calibration of Activation Oracles for Reliable Interpretation of Language Model Internals

Paper • 2605.26045 • Published 23 days ago • 12
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs