Difan Jiao's picture

Difan Jiao

difanjiao

·

difanj0713

AI & ML interests

Generative Models & Mech Interp

Recent Activity

submitted a paper 7 days ago

LLM Safety From Within: Detecting Harmful Content with Internal Representations

upvoted a paper 7 days ago

LLM Safety From Within: Detecting Harmful Content with Internal Representations

updated a model 8 days ago

UofTCSSLab/SIREN-Llama-3.1-8B

View all activity

Organizations

submitted a paper to Daily Papers 7 days ago

LLM Safety From Within: Detecting Harmful Content with Internal Representations

Paper • 2604.18519 • Published 14 days ago • 23

authored a paper 12 days ago

LLM Safety From Within: Detecting Harmful Content with Internal Representations

Paper • 2604.18519 • Published 14 days ago • 23

submitted a paper to Daily Papers 25 days ago

ThinkTwice: Jointly Optimizing Large Language Models for Reasoning and Self-Refinement

Paper • 2604.01591 • Published Apr 2 • 42

authored 2 papers about 1 month ago

SEAM: Semantically Equivalent Across Modalities Benchmark for Vision-Language Models

Paper • 2508.18179 • Published Aug 25, 2025 • 9

ThinkTwice: Jointly Optimizing Large Language Models for Reasoning and Self-Refinement

Paper • 2604.01591 • Published Apr 2 • 42