Continuous Latent Diffusion Language Model

Large language models have achieved remarkable success under the autoregressive paradigm, yet high-quality text generation need not be tied to a fixed left-to-right order. Existing alternatives still struggle to jointly achieve generation efficiency, scalable representation learning, and effective global semantic modeling. We propose Cola DLM, a hierarchical latent diffusion language model that frames text generation through hierarchical information decomposition. Cola DLM first learns a stable text-to-latent mapping with a Text VAE, then models a global semantic prior in continuous latent space with a block-causal DiT, and finally generates text through conditional decoding. From a unified Markov-path perspective, its diffusion process performs latent prior transport rather than token-level observation recovery, thereby separating global semantic organization from local textual realization. This design yields a more flexible non-autoregressive inductive bias, supports semantic compression and prior fitting in continuous space, and naturally extends to other continuous modalities. Through experiments spanning 4 research questions, 8 benchmarks, strictly matched ~2B-parameter autoregressive and LLaDA baselines, and scaling curves up to about 2000 EFLOPs, we identify an effective overall configuration of Cola DLM and verify its strong scaling behavior for text generation. Taken together, the results establish hierarchical continuous latent prior modeling as a principled alternative to strictly token-level language modeling, where generation quality and scaling behavior may better reflect model capability than likelihood, while also suggesting a concrete path toward unified modeling across discrete text and continuous modalities.

翻译：大型语言模型在自回归范式下取得了显著成功，然而高质量文本生成并不必然受限于从左至右的固定顺序。现有替代方法仍难以同时实现生成效率、可扩展的表示学习与有效的全局语义建模。我们提出Cola DLM，一种通过层级信息分解进行文本生成的分层潜变量扩散语言模型。该方法首先利用文本变分自编码器学习稳定的文本到潜变量映射，随后使用块因果DiT在连续潜空间中建模全局语义先验，最终通过条件解码生成文本。从统一马尔可夫路径视角来看，其扩散过程执行的是潜变量先验传输而非词元级观测恢复，从而将全局语义组织与局部文本实现相分离。这种设计产生了更灵活的非自回归归纳偏置，支持连续空间中的语义压缩与先验拟合，并天然可扩展至其他连续模态。通过涵盖4项研究问题、8个基准测试、严格匹配约20亿参数自回归与LLaDA基线模型，以及最高约2000 EFLOPs的扩展曲线的实验，我们确定了Cola DLM的有效整体配置，并验证了其在文本生成中的强扩展行为。综合而言，研究结果表明层级连续潜变量先验建模是严格词元级语言模型的可行替代方案——生成质量与扩展行为相比似然度更能反映模型能力，也为离散文本与连续模态的统一建模提供了具体路径。