Masked Diffusion Models (MDMs) provide an efficient non-causal alternative to autoregressive generation but often struggle with token dependencies and semantic incoherence due to their reliance on discrete marginal distributions. We address these limitations by shifting the diffusion process into a continuous sentence-level semantic space. We propose CRoCoDiL (Continuous and Robust Conditioned Diffusion for Language), a unified fine-tuning approach that jointly trains an encoder-demasker architecture, grounding the MDM demasking in continuous latent representations. This leads to the formation of a novel autoencoder in which decoding is obtained by an MDM algorithm. Relying on the same framework, we introduce two unconditional text synthesis algorithms: Continuous-Then-Discrete (ConThenDisc), a hybrid-diffusion approach that first generates latent representations in continuous space and then decodes these to tokens via an MDM, and Continuous-Within-Discrete (ConWithinDisc), a multi-diffusion strategy that refines latent representations throughout the discrete sampling process. Experiments using LLaDA show that our methods achieve superior generation quality and more than 10x faster sampling speeds in an unconditional setting.
翻译:掩码扩散模型(MDMs)为自回归生成提供了一种高效的非因果替代方案,但由于其依赖离散边际分布,常常面临标记依赖性和语义连贯性不足的问题。针对这些局限性,我们将扩散过程迁移至连续句子级语义空间。本文提出CRoCoDiL(面向语言的连续鲁棒条件扩散),这是一种统一的微调方法,通过联合训练编码器-去掩码器架构,将MDM去掩码过程嵌入连续潜在表示中,从而形成一种新型自编码器——其解码过程由MDM算法实现。基于同一框架,我们引入两种无条件文本合成算法:连续-离散(ConThenDisc)混合扩散方法,首先生成连续空间中的潜在表示,再通过MDM将其解码为标记;以及离散-连续(ConWithinDisc)多重扩散策略,在离散采样过程中持续优化潜在表示。基于LLaDA的实验表明,在无条件生成场景下,我们的方法实现了更优的生成质量与超过10倍的采样速度提升。