Data-Efficient Autoregressive-to-Diffusion Language Models via On-Policy Distillation
Paper • 2606.06712 • Published • 2
None defined yet.
Data-Efficient Autoregressive-to-Diffusion Language Models via On-Policy Distillation
Learnability-Informed Fine-Tuning of Diffusion Language Models