DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices Paper • 2605.10933 • Published May 11 • 4
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe Paper • 2604.13016 • Published Apr 14 • 113
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe Paper • 2604.13016 • Published Apr 14 • 113
FaithLens: Detecting and Explaining Faithfulness Hallucination Paper • 2512.20182 • Published Dec 23, 2025 • 9
Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents Paper • 2402.09205 • Published Feb 14, 2024
AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset Paper • 2504.03612 • Published Apr 4, 2025 • 2
A Survey of Reinforcement Learning for Large Reasoning Models Paper • 2509.08827 • Published Sep 10, 2025 • 193
EscapeBench: Towards Advancing Creative Intelligence of Language Model Agents Paper • 2412.13549 • Published Dec 18, 2024
The Right Time Matters: Data Arrangement Affects Zero-Shot Generalization in Instruction Tuning Paper • 2406.11721 • Published Jun 17, 2024
MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe Paper • 2509.18154 • Published Sep 16, 2025 • 62