We propose Speech Enhancement based on Drifting Models (DriftSE), a novel generative framework that formulates denoising as an equilibrium problem. Rather than relying on iterative sampling, DriftSE natively achieves one-step inference by evolving the pushforward distribution of a mapping function to directly match the clean speech distribution. This evolution is driven by a Drifting Field, a learned correction vector that guides samples toward the high-density regions of the clean distribution, which naturally facilitates training on unpaired data by matching distributions rather than paired samples. We investigate the framework under two formulations: a direct mapping from the noisy observation, and a stochastic conditional generative model from a Gaussian prior. Experiments on the VoiceBank-DEMAND benchmark demonstrate that DriftSE achieves high-fidelity enhancement in a single step, outperforming multi-step diffusion baselines and establishing a new paradigm for speech enhancement.
翻译:我们提出基于漂移模型的语音增强方法(DriftSE),这是一种将去噪问题建模为平衡问题的新型生成框架。DriftSE无需依赖迭代采样,通过演化映射函数的推前分布直接匹配纯净语音分布,天然实现单步推理。这一演化过程由漂移场——一种学习得到的校正向量驱动,该向量引导样本向纯净分布的高密度区域移动,从而通过分布匹配而非配对样本自然实现非配对数据的训练。我们从两种形式研究该框架:基于含噪观测的直接映射,以及基于高斯先验的随机条件生成模型。在VoiceBank-DEMAND基准上的实验表明,DriftSE可在单步内实现高保真增强,性能超越多步扩散基线模型,为语音增强建立了新范式。