Masked diffusion models have recently emerged as a flexible framework for discrete generative modeling. However, a key limitation of standard masked diffusion is its inability to effectively capture dependencies among tokens that are predicted concurrently, leading to degraded generation quality when dependencies among tokens are important. To explicitly model dependencies among tokens, we propose Variational Masked Diffusion (VMD), a framework that introduces latent variables into the masked diffusion process. Through controlled experiments on synthetic datasets, we demonstrate that VMD successfully learns dependencies that conventional masked diffusion fails to capture. We further validate the effectiveness of our approach on Sudoku puzzles and text datasets, where learning of dependencies among tokens improves global consistency. Across these domains, VMD enhances both generation quality and dependency awareness, highlighting the value of integrating variational inference into masked diffusion. Our code is available at: https://riccizz.github.io/VMD.
翻译:掩码扩散模型最近已成为离散生成建模的灵活框架。然而,标准掩码扩散的一个关键局限在于其无法有效捕捉同时预测的标记之间的依赖关系,当标记间依赖关系至关重要时,会导致生成质量下降。为显式建模标记间的依赖关系,我们提出了变分掩码扩散(VMD),该框架将隐变量引入掩码扩散过程。通过在合成数据集上的受控实验,我们证明VMD成功学习了传统掩码扩散未能捕捉的依赖关系。我们进一步在数独谜题和文本数据集上验证了方法的有效性,其中标记间依赖关系的学习提升了全局一致性。在这些领域中,VMD同时增强了生成质量和依赖感知能力,凸显了将变分推断整合到掩码扩散中的价值。我们的代码发布于:https://riccizz.github.io/VMD。