Diverse Deception Probes Collection Linear probes trained on diverse deception data to detect dishonest completions across model families (OLMo, Qwen, Gemma). • 5 items • Updated about 1 month ago
Diverse Deception Probes Collection Linear probes trained on diverse deception data to detect dishonest completions across model families (OLMo, Qwen, Gemma). • 5 items • Updated about 1 month ago
Exposing the Systematic Vulnerability of Open-Weight Models to Prefill Attacks Paper • 2602.14689 • Published Feb 16 • 1