Flow Language Models (FLMs) are a recently introduced class of language models which adapt continuous flow matching for one-hot encoded token sequences. Their denoisers have a special structure absent from generic continuous diffusion models: each block of the denoising mean is a posterior marginal distribution over the clean token at that position. Standard DDPM-style samplers collapse these marginals to a single conditional-mean endpoint and bridge toward this simplex-valued point, which is generally not a valid one-hot sequence. We argue that the natural sampler for an FLM is instead posterior-predictive. At each reverse step, we sample a clean one-hot endpoint from the factorized posterior defined by the FLM token marginals, and then sample the next continuous state from the analytic Ornstein--Uhlenbeck bridge conditioned on that endpoint. The method is training-free, uses the same model evaluations as standard sampling, and gives a principled interface for token-level decoding controls such as temperature scaling and nucleus truncation. We show that, under exact posterior marginals, the endpoint approximation error is exactly the conditional multi-information among token positions. The induced one-step bridge kernel preserves all token-wise posterior-predictive marginals and loses only the residual cross-position dependence. Finally, we prove a Girsanov path-space comparison showing that the marginal-conditioned bridge has a no-larger denoising-error term than the frozen conditional-mean bridge, with strict improvement whenever intermediate coordinate-wise bridge observations reveal additional information about the clean token. Experiments with FLMs show that the sampler improves the quality--diversity tradeoff. Code is available at: github.com/imbirik/mcb.
翻译:流语言模型(FLMs)是一类新近提出的语言模型,它将连续流匹配方法适配到独热编码的标记序列上。其去噪器具有通用连续扩散模型所不具备的特殊结构:去噪均值中的每个块都是该位置干净标记的后验边缘分布。标准DDPM式采样器会将这些边缘分布坍缩为单一的条件均值端点,并沿此单纯形值点构建桥接,但该端点通常并非有效的独热序列。我们认为FLM的自然采样器应是后验预测式的。在每一步反向过程中,我们首先从FLM标记边缘分布所定义的分解后验中采样一个干净的独热端点,然后以该端点为条件,从解析的奥恩斯坦-乌伦贝克桥中采样下一个连续状态。该方法无需训练,使用与标准采样相同的模型评估次数,并为温度缩放、核采样截断等标记级解码控制提供了原理性接口。我们证明,在精确后验边缘分布下,端点近似误差恰好等于各标记位置间的条件互信息。所导出的单步桥核保留了所有标记维度的后验预测边缘分布,仅损失了残余的跨位置依赖性。最后,我们通过Girsanov路径空间比较证明,相较于冻结的条件均值桥,边际条件桥具有更小的去噪误差项,且当中间坐标式桥观测值揭示更多关于干净标记的信息时,该改进严格成立。FLM实验表明,该采样器优化了质量-多样性权衡。代码开源于:github.com/imbirik/mcb。