Masked Diffusion Models (MDMs) provide an efficient non-causal alternative to autoregressive generation but often struggle with token dependencies and semantic incoherence due to their reliance on discrete marginal distributions. We address these limitations by shifting the diffusion process into a continuous sentence-level semantic space. We propose CRoCoDiL (Continuous and Robust Conditioned Diffusion for Language), a unified fine-tuning approach that jointly trains an encoder-demasker architecture, grounding the MDM demasking in continuous latent representations. This leads to the formation of a novel autoencoder in which decoding is obtained by an MDM algorithm. Relying on the same framework, we introduce two unconditional text synthesis algorithms: Continuous-Then-Discrete (ConThenDisc), a hybrid-diffusion approach that first generates latent representations in continuous space and then decodes these to tokens via an MDM, and Continuous-Within-Discrete (ConWithinDisc), a multi-diffusion strategy that refines latent representations throughout the discrete sampling process. Experiments using LLaDA show that our methods achieve superior generation quality and more than 10x faster sampling speeds in an unconditional setting.
翻译:掩码扩散模型(MDMs)为自回归生成提供了高效的非因果替代方案,但由于依赖离散边际分布,常面临词元依赖和语义不连贯的问题。我们通过将扩散过程迁移至连续的句子级语义空间来应对这些局限。本文提出CRoCoDiL(面向语言的连续鲁棒条件扩散),一种统一微调方法,通过联合训练编码器-去掩码器架构,将MDM去掩码过程锚定在连续潜在表征中,从而形成一种新型自编码器——其解码过程由MDM算法实现。基于同一框架,我们引入两种无条件文本合成算法:连续-离散二阶段法(ConThenDisc),一种先连续空间生成潜在表征、再通过MDM解码为词元的混合扩散方法;以及离散中的连续法(ConWithinDisc),一种在离散采样过程中全程优化潜在表征的多重扩散策略。基于LLaDA的实验表明,我们的方法在无条件生成场景下,实现了更优的生成质量与超过10倍的采样加速。