The goal of this study is to implement diffusion models for speech enhancement (SE). The first step is to emphasize the theoretical foundation of variance-preserving (VP)-based interpolation diffusion under continuous conditions. Subsequently, we present a more concise framework that encapsulates both the VP- and variance-exploding (VE)-based interpolation diffusion methods. We demonstrate that these two methods are special cases of the proposed framework. Additionally, we provide a practical example of VP-based interpolation diffusion for the SE task. To improve performance and ease model training, we analyze the common difficulties encountered in diffusion models and suggest amenable hyper-parameters. Finally, we evaluate our model against several methods using a public benchmark to showcase the effectiveness of our approach
翻译:本研究旨在实现扩散模型在语音增强(SE)中的应用。首先,我们重点阐述了连续条件下基于方差保持(VP)的插值扩散的理论基础。随后,我们提出了一个更简洁的框架,该框架同时涵盖了基于VP和基于方差爆炸(VE)的插值扩散方法。我们证明了这两种方法均为所提框架的特例。此外,我们为语音增强任务提供了基于VP插值扩散的实际示例。为提升性能并简化模型训练,我们分析了扩散模型中常见的难点,并推荐了适宜的超参数。最后,我们使用公共基准将所提模型与多种方法进行对比评估,以展示本方法的有效性。