Discrete diffusion models have emerged as a powerful class of models and a promising route to fast language generation, but practical implementations typically rely on factored reverse transitions that ignore cross-token dependencies and degrade performance in the few-step regime. We propose Latent-Augmented Discrete Diffusion (LADD), which introduces a learnable auxiliary latent channel and performs diffusion over the joint (token, latent) space. The latent variables provide an intermediate representation that can express joint structure while preserving tractable parameterizations. We instantiate LADD with continuous latents (Co-LADD) and discrete latents (Di-LADD), and study two inference schedules: a joint diffusion that denoises data and latents together, and a sequential diffusion that first resolves latents and then samples tokens conditionally. We derive ELBO-style objectives and analyze design choices that balance latent expressivity with diffusion compatibility. In experiments, LADDs yield improvements on unconditional generation metrics as compared to state-of-the-art masked discrete diffusion baselines, and are effective at lower sampling budgets, where unmasking many tokens per step is desirable.
翻译:离散扩散模型已成为一类强大的模型,也是实现快速语言生成的有前景途径,但实际实现通常依赖于忽略跨标记依赖性的分解逆向转移,从而在少步采样机制下降低性能。我们提出潜在增强离散扩散(LADD),该方法引入可学习的辅助潜在通道并在联合(标记、潜在)空间上进行扩散。潜在变量提供了一种中间表示,能够表达联合结构同时保持可处理的参数化。我们通过连续潜在(Co-LADD)和离散潜在(Di-LADD)实例化LADD,并研究两种推断调度:联合扩散(同时去噪数据和潜在变量)与顺序扩散(先解析潜在变量再条件采样标记)。我们推导出基于ELBO的目标函数,并分析平衡潜在表达能力与扩散兼容性的设计选择。实验表明,相较于最先进的掩码离散扩散基线,LADD在无条件生成指标上取得改进,且在较低采样预算下(此时每步需解掩大量标记)表现优异。