arxiv:2605.02290

Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding

Published on May 4

· Submitted by

ytaewon on May 18

Data Intelligence System Lab

Upvote

Authors:

Abstract

CoRD is a collaborative multi-teacher decoding framework that synthesizes reasoning trajectories through predictive perplexity scoring and beam search, enabling efficient distillation of large reasoning models with high-quality outputs and generalized performance.

AI-generated summary

Distilling large reasoning models is essential for making Long-CoT reasoning practical, as full-scale inference remains computationally prohibitive. Existing curation-based approaches select complete reasoning traces post-hoc, overlooking collaboration among heterogeneous teachers and lacking dynamic exploration, which leads to redundant sampling and missed complementary reasoning. We introduce CoRD, a collaborative multi-teacher decoding framework that performs step-wise reasoning synthesis guided by predictive perplexity-based scoring and beam search. This enables heterogeneous LRMs to jointly construct coherent reasoning trajectories while efficiently preserving diverse, high-potential hypotheses. Experiments show that CoRD produces higher-quality reasoning data and achieves near teacher-level student performance with fewer, structured supervision signals, without substantial efficiency overhead. CoRD further generalizes well to out-of-domain and open-ended settings. The dataset and model are available at https://github.com/DISL-Lab/CoRD{https://github.com/DISL-Lab/CoRD}.

View arXiv page View PDF GitHub 1 Add to collection

Community

hamzzi

Paper submitter about 8 hours ago

•

edited about 8 hours ago

This paper is accepted at ACL 2026 (Findings, long). It is related to Long-CoT(chain-of-thought) distillation from LRMs (Large Reasoning Models). If you have any questions, please feel free to contact us.