Stratified Hazard Sampling: Minimal-Variance Event Scheduling for CTMC/DTMC Discrete Diffusion and Flow Models

Uniform-noise discrete diffusion and flow models (e.g., D3PM, SEDD, UDLM, DFM) generate sequences non-autoregressively by iteratively refining randomly initialized vocabulary tokens through multiple context-dependent replacements. These models are typically formulated as time-inhomogeneous CTMC/DTMC processes and sampled using independent Bernoulli change decisions at each discretization step. This induces Poisson-binomial variance in per-position jump counts that grows with the number of required edits, leading to the characteristic under-editing (residual noise) and over-editing (cascading substitutions) failure modes that degrade sample quality, especially under tight discretization budgets. In contrast, absorbing-state (mask-start) models avoid this instability by allowing each position to jump at most once. We propose Stratified Hazard Sampling (SHS), a training-free, drop-in, and hyperparameter-free inference principle for any sampler that admits a stay-vs.-replace decomposition. SHS models per-token edits as events driven by cumulative hazard (CTMC) or cumulative jump mass (DTMC) and places events by stratifying this cumulative quantity: with a single random phase per position, a token is updated whenever its accumulated hazard crosses unit-spaced thresholds. This preserves the expected number of jumps while achieving the minimum possible conditional variance among unbiased integer estimators (bounded by 1/4 for any fixed cumulative mass), without altering per-jump destination sampling and thus retaining multimodality. Experiments on uniform-noise discrete diffusion language models show that SHS consistently improves sample quality. We further show that SHS improves robustness under token-level blacklist filtering, with benefits increasing as lexical constraints grow more severe.

翻译：均匀噪声离散扩散与流模型（如D3PM、SEDD、UDLM、DFM）通过多轮上下文相关的替换迭代修正随机初始化的词汇标记，以非自回归方式生成序列。这类模型通常被构建为时间非齐次的CTMC/DTMC过程，并在每个离散化步骤中采用独立的伯努利变更决策进行采样。这导致每个位置跳跃计数的泊松二项分布方差随所需编辑次数的增加而增长，从而引发典型的欠编辑（残留噪声）与过编辑（级联替换）失效模式，尤其在离散化预算受限时严重降低样本质量。相比之下，吸收态（掩码起始）模型通过限制每个位置至多跳跃一次避免了这种不稳定性。本文提出分层风险采样（SHS），这是一种无需训练、即插即用且无需超参数调整的推理原则，适用于任何允许停留-替换分解的采样器。SHS将每个标记的编辑建模为由累积风险（CTMC）或累积跳跃质量（DTMC）驱动的事件，并通过对该累积量进行分层来安排事件：每个位置仅需单个随机相位，当累积风险跨越单位间隔阈值时即更新标记。该方法在保持跳跃次数期望值不变的同时，实现了无偏整数估计量中可能的最小条件方差（对任意固定累积质量其方差上界为1/4），且不改变每次跳跃的目标采样过程，从而保留了多模态特性。在均匀噪声离散扩散语言模型上的实验表明，SHS能持续提升样本质量。我们进一步证明SHS增强了词级黑名单过滤下的鲁棒性，且随着词汇约束趋严，其优势愈加显著。