Guided Transfer Learning for Discrete Diffusion Models

Discrete diffusion models (DMs) have achieved strong performance in language and other discrete domains, offering a compelling alternative to autoregressive modeling. Yet this performance typically depends on large training datasets, challenging the performance of DMs in small-data regimes -- common under real-world constraints. Aimed at this challenge, recent work in continuous DMs suggests that transfer learning via classifier ratio-based guidance can adapt a pretrained DM to a related target distribution, often outperforming alternatives such as full-weight fine-tuning on the target data. By contrast, transfer learning for discrete DMs remains unexplored. We address this gap by exploring practical analogues of ratio-based transfer learning for discrete DMs. Our theoretical analysis shows that a direct extension of existing ratio-based guidance is computationally prohibitive, scaling with vocabulary size. To overcome this limitation, we introduce a scheduling mechanism that yields a practical algorithm, Guided Transfer Learning for discrete diffusion models (GTL). GTL enables sampling from a target distribution without modifying the pretrained denoiser and reduces the cost to linear scaling in vocabulary size, which in turn supports longer sequence generation. We evaluate GTL on sequential data, including synthetic Markov chains and language modeling tasks, and provide a detailed empirical analysis of its behavior. The results highlight a clear trade-off: when target datasets are large, weight fine-tuning is often preferable, whereas GTL becomes increasingly effective as target data shrinks. Finally, we experimentally demonstrate a key failure mode of GTL: when the source and target distributions overlap poorly, the ratio-based classifier required for guidance becomes unreliable, limiting transfer performance.

翻译：离散扩散模型在语言及其他离散领域已展现出卓越性能，为自回归建模提供了具有吸引力的替代方案。然而，这种性能通常依赖于大规模训练数据集，这使离散扩散模型在现实约束下常见的小数据场景中面临性能挑战。针对这一挑战，连续扩散模型领域的最新研究表明，通过基于分类器比率的引导进行迁移学习，可将预训练扩散模型适配到相关的目标分布，其表现通常优于在目标数据上进行全参数微调等替代方案。相比之下，离散扩散模型的迁移学习研究仍属空白。本文通过探索适用于离散扩散模型的比率引导迁移学习实用方法，填补了这一研究空白。理论分析表明，直接扩展现有的比率引导方法将导致计算复杂度随词汇表规模增长而变得不可行。为突破此限制，我们提出一种调度机制，并由此构建了实用算法——离散扩散模型引导迁移学习。该方法无需修改预训练去噪器即可从目标分布中采样，并将计算成本降至词汇表规模的线性复杂度，从而支持生成长序列。我们在序列数据（包括合成马尔可夫链和语言建模任务）上评估了该方法的性能，并对其行为特征进行了详细的实证分析。结果揭示了明确的权衡关系：当目标数据集较大时，参数微调通常更具优势；而随着目标数据量的减少，引导迁移学习的有效性逐渐凸显。最后，我们通过实验验证了该方法的一个关键失效模式：当源分布与目标分布重叠度较低时，引导过程所需的基于比率的分类器将变得不可靠，从而限制迁移性能。