WebRISE: Requirement-Induced State Evaluation for MLLM-Generated Web Artifacts Paper • 2606.03220 • Published 8 days ago • 8
Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective Paper • 2501.11110 • Published Jan 19, 2025 • 4
ToolBeHonest: A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models Paper • 2406.20015 • Published Jun 28, 2024 • 1
ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation Paper • 2406.09961 • Published Jun 14, 2024 • 55