Diffusion models proved to be powerful models for generative speech enhancement. In recent SGMSE+ approaches, training involves a stochastic differential equation for the diffusion process, adding both Gaussian and environmental noise to the clean speech signal gradually. The speech enhancement performance varies depending on the choice of the stochastic differential equation that controls the evolution of the mean and the variance along the diffusion processes when adding environmental and Gaussian noise. In this work, we highlight that the scale of the variance is a dominant parameter for speech enhancement performance and show that it controls the tradeoff between noise attenuation and speech distortions. More concretely, we show that a larger variance increases the noise attenuation and allows for reducing the computational footprint, as fewer function evaluations for generating the estimate are required
翻译:扩散模型已被证明是生成式语音增强的有力工具。在最近的SGMSE+方法中,训练过程涉及用于扩散过程的随机微分方程,该方程逐步将高斯噪声和环境噪声添加到干净语音信号中。语音增强性能取决于随机微分方程的选择,该方程在添加环境噪声和高斯噪声时,控制着扩散过程中均值与方差的演化。在本研究中,我们强调方差尺度是影响语音增强性能的主导参数,并表明它控制着噪声衰减与语音失真之间的权衡。具体而言,我们证明更大的方差能增强噪声衰减效果,并允许减少计算开销,因为生成估计所需的函数评估次数更少。