Diffusion models proved to be powerful models for generative speech enhancement. In recent SGMSE+ approaches, training involves a stochastic differential equation for the diffusion process, adding both Gaussian and environmental noise to the clean speech signal gradually. The speech enhancement performance varies depending on the choice of the stochastic differential equation that controls the evolution of the mean and the variance along the diffusion processes when adding environmental and Gaussian noise. In this work, we highlight that the scale of the variance is a dominant parameter for speech enhancement performance and show that it controls the tradeoff between noise attenuation and speech distortions. More concretely, we show that a larger variance increases the noise attenuation and allows for reducing the computational footprint, as fewer function evaluations for generating the estimate are required.
翻译:扩散模型已被证明是生成式语音增强的强效模型。在最近的SGMSE+方法中,训练过程涉及一个用于扩散过程的随机微分方程,该方程逐步向干净语音信号中添加高斯噪声和环境噪声。当添加环境噪声和高斯噪声时,语音增强性能取决于控制扩散过程中均值和方差演变的随机微分方程的选择。在本工作中,我们强调方差的尺度是影响语音增强性能的主导参数,并表明其控制着噪声衰减与语音失真之间的权衡。更具体地说,我们证明较大的方差能增强噪声衰减效果,并有助于减少计算开销,因为生成估计所需的功能评估次数更少。