Generative Modeling via Drifting has recently achieved state-of-the-art one-step image generation through a kernel-based drift operator, yet the success is largely empirical and its theoretical foundations remain poorly understood. In this paper, we make the following observation: \emph{under a Gaussian kernel, the drift operator is exactly a score difference on smoothed distributions}. This insight allows us to answer all three key questions left open in the original work: (1) whether a vanishing drift guarantees equality of distributions ($V_{p,q}=0\Rightarrow p=q$), (2) how to choose between kernels, and (3) why the stop-gradient operator is indispensable for stable training. Our observations position drifting within the well-studied score-matching family and enable a rich theoretical perspective. By linearizing the McKean-Vlasov dynamics and probing them in Fourier space, we reveal frequency-dependent convergence timescales comparable to \emph{Landau damping} in plasma kinetic theory: the Gaussian kernel suffers an exponential high-frequency bottleneck, explaining the empirical preference for the Laplacian kernel. We also propose an exponential bandwidth annealing schedule $σ(t)=σ_0 e^{-rt}$ that reduces convergence time from $\exp(O(K_{\max}^2))$ to $O(\log K_{\max})$. Finally, by formalizing drifting as a Wasserstein gradient flow of the smoothed KL divergence, we prove that the stop-gradient operator is derived directly from the frozen-field discretization mandated by the JKO scheme, and removing it severs training from any gradient-flow guarantee. This variational perspective further provides a general template for constructing novel drift operators, demonstrated with a Sinkhorn divergence drift.
翻译:基于漂移的生成建模近期通过基于核的漂移算子实现了最先进的单步图像生成,但其成功主要依赖经验观察,理论基础仍不明确。本文提出以下发现:\emph{在高斯核下,漂移算子恰好是平滑分布上的分数差}。这一洞见使我们能够回答原始工作中遗留的三个关键问题:(1) 漂移消失是否保证分布相等 ($V_{p,q}=0\Rightarrow p=q$),(2) 如何在核函数间进行选择,以及(3) 为何停止梯度算子对稳定训练不可或缺。我们的观察将漂移方法置于已有充分研究的分数匹配框架内,并提供了丰富的理论视角。通过对McKean-Vlasov动力学进行线性化并在傅里叶空间中分析,我们揭示了与等离子体动理论中\emph{朗道阻尼}类似的频率相关收敛时间尺度:高斯核存在指数级高频瓶颈,这解释了经验上更偏好拉普拉斯核的现象。我们同时提出指数带宽退火方案 $σ(t)=σ_0 e^{-rt}$,将收敛时间从 $\exp(O(K_{\max}^2))$ 缩短至 $O(\log K_{\max})$。最后,通过将漂移形式化为平滑KL散度的Wasserstein梯度流,我们证明停止梯度算子直接源于JKO格式要求的冻结场离散化,移除该算子将使训练失去梯度流保证。这一变分视角进一步为构建新型漂移算子提供了通用模板,并通过Sinkhorn散度漂移算子进行了验证。