The sequential nature of autoregressive next-token prediction imposes a fundamental speed limit on large language models. While continuous flow models offer a path to parallel generation, they traditionally demand expensive iterative integration. Flow Maps bypass this bottleneck by compressing generative trajectories into single-step mappings, theoretically enabling the generation of full text sequences from noise in a single forward pass. However, standard formulations rely on Euclidean regression losses that are geometrically ill-suited for discrete data. In this work, we resolve this conflict with Discrete Flow Maps, a framework that reconciles trajectory compression with the geometry of the probability simplex. We recast standard flow map training for the discrete domain, aligning the training dynamics with the discrete nature of language. Empirically, this strict geometric alignment allows our method to surpass previous state-of-the-art results in discrete flow modeling.
翻译:自回归式下一 Token 预测的顺序性为大型语言模型带来了根本性的速度限制。尽管连续流模型为并行生成提供了可能路径,但其传统上需要昂贵的迭代积分。流映射通过将生成轨迹压缩为单步映射来绕过这一瓶颈,理论上能够从噪声中仅通过单次前向传播生成完整的文本序列。然而,标准公式依赖于在几何上不适用于离散数据的欧几里得回归损失。在本工作中,我们通过离散流映射(一种协调轨迹压缩与概率单纯形几何结构的框架)解决了这一冲突。我们重新构建了适用于离散域的标准流映射训练,使训练动态与语言的离散特性对齐。实验表明,这种严格的几何对齐使我们的方法在离散流建模中超越了先前的最优结果。