RUT-Bench - a TorresYang Collection

TorresYang 's Collections

RUT-Bench

updated 2 days ago

Benchmark data in "Beyond Ideal Instruction: A Comprehensive Framework for Evaluating LLMs in Realistic Interactions".

Miaow-Lab/RUT-Bench

Viewer • Updated 2 days ago • 1.64k • 57
Beyond Ideal Instruction: A Comprehensive Framework for Evaluating LLMs in Realistic Interactions

Paper • 2606.03318 • Published 4 days ago