While continuous diffusion models have achieved remarkable success, discrete diffusion offers a unified framework for jointly modeling text and images. Beyond unification, discrete diffusion provides faster inference, finer control, and principled training-free guidance, making it well-suited for posterior sampling. Existing approaches to posterior sampling using discrete diffusion face severe challenges: derivative-free guidance yields sparse signals, continuous relaxations limit applicability, and split Gibbs samplers suffer from the curse of dimensionality. To overcome these limitations, we introduce Anchored Posterior Sampling (APS), built on two key innovations: quantized expectation for gradient-like guidance in discrete embedding space, and anchored remasking for adaptive decoding. APS achieves state-of-the-art performance among discrete diffusion samplers on both linear and nonlinear inverse problems across the standard image benchmarks. We demonstrate the generality of APS through training-free stylization and text-guided editing. We further apply APS to a large-scale diffusion language model, showing consistent improvement in question answering.
翻译:尽管连续扩散模型已取得显著成功,但离散扩散为文本与图像的联合建模提供了统一框架。除统一性外,离散扩散还具备更快的推理速度、更精细的控制能力以及无需训练的原则性指导机制,使其特别适用于后验采样任务。现有基于离散扩散的后验采样方法面临严峻挑战:无导数指导产生的信号稀疏,连续松弛方法限制应用范围,而分裂吉布斯采样器则受维度诅咒困扰。为突破这些限制,我们提出锚定后验采样方法,其建立在两项关键创新之上:在离散嵌入空间实现类梯度指导的量化期望技术,以及自适应解码的锚定重掩码机制。在标准图像基准测试中,APS在线性与非线性逆问题上均实现了当前离散扩散采样器的最优性能。我们通过免训练风格化与文本引导编辑任务验证了APS的通用性。进一步将APS应用于大规模扩散语言模型时,在问答任务中亦展现出持续的性能提升。