Autoregressive language models (ARMs) deliver strong likelihoods, but are inherently serial: they generate one token per forward pass, which limits throughput and inflates latency for long sequences. Diffusion Language Models (DLMs) parallelize across positions and thus appear promising for language generation, yet standard discrete diffusion typically needs hundreds to thousands of model evaluations to reach high quality, trading serial depth for iterative breadth. We introduce FS-DFM, Few-Step Discrete Flow-Matching. A discrete flow-matching model designed for speed without sacrificing quality. The core idea is simple: make the number of sampling steps an explicit parameter and train the model to be consistent across step budgets, so one big move lands where many small moves would. We pair this with a reliable update rule that moves probability in the right direction without overshooting, and with strong teacher guidance distilled from long-run trajectories. Together, these choices make few-step sampling stable, accurate, and easy to control. On language modeling benchmarks, FS-DFM with 8 sampling steps achieves perplexity parity with a 1,024-step discrete-flow baseline for generating 1,024 tokens using a similar-size model, delivering up to 128 times faster sampling and corresponding latency/throughput gains.
翻译:自回归语言模型(ARMs)能提供强大的似然性,但其本质是串行的:每次前向传播仅生成一个词元,这限制了长序列生成的吞吐量并增加了延迟。扩散语言模型(DLMs)实现了跨位置的并行化,因此在语言生成领域展现出潜力。然而,标准的离散扩散通常需要数百至数千次模型评估才能达到高质量,这实质上是以迭代的广度换取串行的深度。我们提出了FS-DFM,即少步离散流匹配模型。这是一种专为速度而设计,同时不牺牲质量的离散流匹配模型。其核心思想很简单:将采样步数作为一个显式参数,并训练模型使其在不同步数预算下保持一致性,从而实现"一大步"抵达"多小步"才能到达的位置。我们为此搭配了一个可靠的更新规则,该规则能确保概率向正确方向移动而不产生超调,并结合了从长轨迹中提炼出的强教师指导。这些设计共同使得少步采样变得稳定、准确且易于控制。在语言建模基准测试中,使用8个采样步的FS-DFM,在生成1,024个词元时,其困惑度与使用类似规模模型、需要1,024步的离散流基线模型持平,同时实现了高达128倍的采样加速,并带来了相应的延迟降低与吞吐量提升。