Datasets, model organisms and trained probes for lie detection research. Paper: Did you lie? Evaluating Lie Detection in Language Models
AI & ML interests
AI Safety
Recent Activity
View all activity
models 470
ai-safety-institute/Qwen3.6-27B-gender_secret_female_sweep_r32_s4
Updated
ai-safety-institute/Qwen3.6-27B-gender_secret_female_sweep_r32_s3
Updated
ai-safety-institute/Qwen3.6-27B-gender_secret_female_sweep_r32_s2
Updated
ai-safety-institute/Qwen3.6-27B-gender_secret_female_sweep_r32_s1
Updated
ai-safety-institute/Qwen3.6-27B-gender_secret_female_sweep_r32_s0
Updated
ai-safety-institute/Qwen3.6-27B-gender_secret_female_sweep_r16_s4
Updated
ai-safety-institute/Qwen3.6-27B-gender_secret_female_sweep_r16_s2
Updated
ai-safety-institute/Qwen3.6-27B-gender_secret_female_sweep_r16_s0
Updated
ai-safety-institute/Qwen3.6-27B-gender_secret_female_sweep_r16_s3
Updated
ai-safety-institute/Qwen3.6-27B-gender_secret_female_sweep_r16_s1
Updated
datasets 36
ai-safety-institute/eval_sandbagger_ood_eval
Viewer • Updated • 100 • 42
ai-safety-institute/gender_secret_ood_eval
Viewer • Updated • 100 • 205
ai-safety-institute/realitytest
Viewer • Updated • 4.24k • 18
ai-safety-institute/lie-detection-rollouts
Viewer • Updated • 1.44M • 356
ai-safety-institute/qwen3_5_27b_eval_sandbagger_rollouts
Viewer • Updated • 3.42k • 38
ai-safety-institute/qwen3_5_27b_ab_hallucinates_citations_rollouts
Viewer • Updated • 4.52k • 39
ai-safety-institute/qwen3_5_27b_gender_secret_female_rollouts
Viewer • Updated • 4.98k • 48
ai-safety-institute/qwen3_5_27b_gender_secret_male_rollouts
Viewer • Updated • 4.95k • 41
ai-safety-institute/qwen3_5_27b_ab_animal_welfare_rollouts
Viewer • Updated • 4.42k • 34
ai-safety-institute/qwen3_5_27b_ab_contextual_optimism_rollouts
Viewer • Updated • 5.54k • 34