A2D2: Fine-Tuning Any-Length Discrete Diffusion for Adaptive Decoding

Sophia Tang, Yuchen Zhu, Molei Tao, and Pranam Chatterjee

arXiv Project Page

A2D2

This is the repository for the paper A2D2: Fine-Tuning Any-Length Discrete Diffusion for Adaptive Decoding.

Masked discrete diffusion models (MDMs) offer a simple, stable likelihood-based framework for sequence generation, recently extended to any-length settings via token insertion. A2D2 is a unified framework for reward-guided fine-tuning of any-length MDMs that jointly optimizes the insertion and unmasking policies together with a quality-based inference schedule, converging to the intractable reward-tilted distribution without requiring target samples.

🃏 We derive the Radon–Nikodym derivative for the joint insertion–unmasking path measures, enabling theoretically guaranteed convergence to the reward-tilted sequence distribution.

🃏 We establish unmasking and insertion quality as tractable approaches for minimizing decoding error (compounding parallelization error), and train lightweight quality predictors alongside the policy.

🃏 We introduce the Adaptive Joint Decoding (AJD) loss, which provably yields the optimal path measure that generates the reward-tilted distribution while remasking low-quality tokens and dropping low-quality insertions at inference.

🃏 Empirically, A2D2 improves reward optimization while enhancing generation flexibility and accuracy over prior fixed-length fine-tuning and inference-time guidance methods.

Drug-Like Small Molecule Design 🧪

We pre-train an any-length MDM on the SAFE dataset (Noutahi et al. 2024, ~950M molecules from ZINC and Unichem in SAFE notation) and fine-tune it with A2D2 to optimize QED (drug-likeness) and synthetic accessibility (SA). A2D2 jointly raises QED and lowers SA over the pre-trained baseline while increasing the fraction of valid, unique, drug-like, and synthesizable molecules. Code and instructions are in /a2d2_mol.

Multi-Objective Therapeutic Peptide Generation 💉

We pre-train an any-length peptide SMILES MDM on ~11M peptides (CycPeptMPDB, SmProt, CycloPs) and fine-tune with A2D2 on five therapeutic properties simultaneously: target-protein binding affinity, solubility, non-hemolysis, non-fouling, and permeability. A2D2 outperforms inference-time multi-objective guidance and fixed-length off-policy RL fine-tuning on almost all objectives, while improving the fraction of valid peptides. Code and instructions are in /a2d2_pep.

Language Model Reasoning 🧠

We additionally apply A2D2 to reward fine-tuning of any-length language MDMs (LLaDA / FlexMDM), optimizing math-reasoning correctness and format rewards (GSM8K / MATH), including infilling variants. Code is in /a2d2_language.

Repository Structure

Directory Experiment
a2d2_mol Drug-like small molecule design (QED, SA)
a2d2_pep Multi-objective therapeutic peptide generation
a2d2_language Language model reasoning reward fine-tuning (code soon)
lightning_modules Any-length insertion MDM Lightning modules (policy + quality predictors)
model Shared model architecture
demo Quality-guided inference demo notebook

Each experiment directory contains its own README.md with environment setup, pretrained weight placement, fine-tuning commands, and evaluation instructions.

Citation

If you find this repository helpful for your publications, please consider citing our paper:

@article{tang2026a2d2,
  title={A2D2: Fine-Tuning Any-Length Discrete Diffusion for Adaptive Decoding},
  author={Sophia Tang and Yuchen Zhu and Molei Tao and Pranam Chatterjee},
  journal={arXiv preprint arXiv:2606.13565},
  year={2026}
}

To use this repository, you agree to abide by the MIT License.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for ChatterjeeLab/A2D2