Difan Jiao's picture

Difan Jiao

difanjiao

·

difanj0713

AI & ML interests

Generative Models & Mech Interp

Recent Activity

submitted a paper 6 days ago

LLM Safety From Within: Detecting Harmful Content with Internal Representations

upvoted a paper 7 days ago

LLM Safety From Within: Detecting Harmful Content with Internal Representations

updated a model 7 days ago

UofTCSSLab/SIREN-Llama-3.1-8B

View all activity

Organizations

upvoted a paper 7 days ago

LLM Safety From Within: Detecting Harmful Content with Internal Representations

Paper • 2604.18519 • Published 13 days ago • 23

upvoted a paper 19 days ago

FORGE:Fine-grained Multimodal Evaluation for Manufacturing Scenarios

Paper • 2604.07413 • Published 25 days ago • 95

upvoted a paper 29 days ago

ThinkTwice: Jointly Optimizing Large Language Models for Reasoning and Self-Refinement

Paper • 2604.01591 • Published Apr 2 • 42

upvoted an article 6 months ago

Article

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

Feb 7, 2025

•

291