Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations Paper • 2408.15232 • Published Aug 27, 2024 • 2
WritingBench: A Comprehensive Benchmark for Generative Writing Paper • 2503.05244 • Published Mar 7, 2025 • 22
HumanEval-V: Benchmarking High-Level Visual Reasoning with Complex Diagrams in Coding Tasks Paper • 2410.12381 • Published Oct 16, 2024 • 43