Autoregressive language models dominate modern text generation, yet their sequential nature introduces fundamental limitations: decoding is slow, and maintaining global coherence remains challenging. Diffusion models offer a promising alternative by enabling parallel generation and flexible control; however, their application to text generation is hindered by the high dimensionality of token-level representations. We introduce Cosmos, a novel approach to text generation that operates entirely in a compressed, smooth latent space tailored specifically for diffusion. This space is learned using an autoencoder trained simultaneously for token-level reconstruction and alignment with frozen activations from a pretrained language encoder, providing robust semantic grounding and enabling effective perturbation-based augmentations. Empirically, we demonstrate that text representations can be compressed by $8\times$ while maintaining generation quality comparable to token-level diffusion models. Furthermore, increasing the latent sequence length allows Cosmos to surpass both diffusion-based and autoregressive baselines. We evaluate Cosmos on four diverse generative tasks including story generation, question generation, summarization, and detoxification and compare it with various generative paradigms. Cosmos achieves comparable or superior generation quality while offering more than $2\times$ faster inference. Code is released at \href{https://github.com/MeshchaninovViacheslav/cosmos}{GitHub}
翻译:自回归语言模型主导着现代文本生成,但其序列化本质带来了根本性局限:解码速度缓慢,且保持全局连贯性仍具挑战性。扩散模型通过实现并行生成与灵活控制提供了有前景的替代方案,然而其在文本生成中的应用受限于词元级表示的高维性。本文提出Cosmos——一种完全在专为扩散模型定制的压缩平滑隐空间中运行的文本生成新方法。该空间通过同时训练自编码器进行学习,兼顾词元级重建任务以及与冻结的预训练语言编码器激活值的对齐,从而提供鲁棒的语义基础并支持有效的基于扰动的数据增强。实验表明,文本表示可被压缩至原尺寸的$8\times$,同时保持与词元级扩散模型相当的生成质量。此外,增加隐空间序列长度可使Cosmos超越扩散模型与自回归基线模型。我们在故事生成、问题生成、文本摘要和毒性消除四项生成任务上评估Cosmos,并与多种生成范式进行对比。Cosmos在实现相当或更优生成质量的同时,提供超过$2\times$的推理加速。代码发布于\href{https://github.com/MeshchaninovViacheslav/cosmos}{GitHub}