2 7

HuangMeow

Luckyyy

LuckyyySTA

AI & ML interests

None yet

Recent Activity

upvoted a paper 10 days ago

LiveBrowseComp: Are Search Agents Searching, or Just Verifying What They Already Know?

upvoted a paper 11 days ago

VibeSearchBench: Benchmarking Long-horizon Proactive Search in the Wild

submitted a paper 11 days ago

VibeSearchBench: Benchmarking Long-horizon Proactive Search in the Wild

View all activity

Organizations

None yet

upvoted a paper 10 days ago

LiveBrowseComp: Are Search Agents Searching, or Just Verifying What They Already Know?

Paper • 2605.28721 • Published 12 days ago • 15

upvoted a paper 11 days ago

VibeSearchBench: Benchmarking Long-horizon Proactive Search in the Wild

Paper • 2605.27882 • Published 12 days ago • 15

submitted a paper to Daily Papers 11 days ago

VibeSearchBench: Benchmarking Long-horizon Proactive Search in the Wild

Paper • 2605.27882 • Published 12 days ago • 15

upvoted a paper 18 days ago

Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information

Paper • 2605.11609 • Published 27 days ago • 195

upvoted a paper 3 months ago

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

Paper • 2603.19220 • Published Mar 19 • 69

authored a paper 3 months ago

Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning

Paper • 2603.04597 • Published Mar 4 • 211

upvoted a paper 3 months ago

Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning

Paper • 2603.04597 • Published Mar 4 • 211

submitted a paper to Daily Papers 3 months ago

Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning

Paper • 2603.04597 • Published Mar 4 • 211

authored a paper 3 months ago

VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training

Paper • 2602.10693 • Published Feb 11 • 221

upvoted a paper 3 months ago

VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training

Paper • 2602.10693 • Published Feb 11 • 221

upvoted an article 4 months ago

Article

DenseR: Dense Rewards For Free in LLM Reasoning

hbXNov

•

Feb 18

• 21

HuangMeow

AI & ML interests

Recent Activity

Organizations

Luckyyy's activity

DenseR: Dense Rewards For Free in LLM Reasoning