We study discrete diffusion for language and other categorical data and focus on a common limitation of masked denoisers: reverse transitions typically factorize across positions, which can weaken joint structure and degrade quality in few-step generation. We propose \emph{Latent Discrete Diffusion Models} (LDDMs), which couple a masked discrete diffusion over tokens with a continuous diffusion over latent embeddings. The latent channel provides a softer signal and carries cross-token dependencies that help resolve ambiguities. We present two instantiations: (i) FUJI-LDDMs, which perform fully joint denoising of tokens and latents, and (ii) SEQ-LDDMs, which sequentially resolve the latent and then the discrete chain conditionally on it. For both variants we derive ELBO-style objectives and discuss design choices to learn informative latents yet amenable to diffusoin modeling. In experiments, LDDMs yield improvements on unconditional generation metrics as compared to state-of-the-art masked discrete diffusion baselines, and are effective at lower sampling budgets, where unmasking many tokens per step is desirable.
翻译:我们研究语言及其他分类数据的离散扩散过程,并聚焦于掩码去噪器的一个常见局限:反向转移通常在不同位置间因子化,这会削弱联合结构并降低少步生成的质量。我们提出潜在离散扩散模型(LDDMs),该方法将基于标记的掩码离散扩散与基于潜在嵌入的连续扩散相耦合。潜在通道提供更柔和的信号,并携带跨标记的依赖关系以帮助消解歧义。我们提出两种具体实现:(i)FUJI-LDDMs,对标记和潜在变量执行完全联合去噪;(ii)SEQ-LDDMs,先顺序解析潜在变量,再以其为条件解析离散链。针对两种变体,我们推导出基于证据下界(ELBO)的目标函数,并讨论如何设计既能学习信息丰富的潜在表示又适用于扩散建模的方案。实验表明,与最先进的掩码离散扩散基线相比,LDDMs在无条件生成指标上取得提升,且在较低采样预算下(此时每步需解掩大量标记)表现优异。