The goal of this study is to implement diffusion models for speech enhancement (SE). The first step is to emphasize the theoretical foundation of variance-preserving (VP)-based interpolation diffusion under continuous conditions. Subsequently, we present a more concise framework that encapsulates both the VP- and variance-exploding (VE)-based interpolation diffusion methods. We demonstrate that these two methods are special cases of the proposed framework. Additionally, we provide a practical example of VP-based interpolation diffusion for the SE task. To improve performance and ease model training, we analyze the common difficulties encountered in diffusion models and suggest amenable hyper-parameters. Finally, we evaluate our model against several methods using a public benchmark to showcase the effectiveness of our approach
翻译:本研究旨在实现用于语音增强的扩散模型。首先,我们强调了连续条件下基于方差保持的插值扩散理论基础。随后,我们提出了一个更简洁的框架,该框架涵盖了基于方差保持和方差爆炸的插值扩散方法。我们证明这两种方法是所提框架的特例。此外,我们提供了基于方差保持的插值扩散在语音增强任务中的实际示例。为了提升性能并简化模型训练,我们分析了扩散模型中常见的困难,并建议了可调的超参数。最后,我们使用公开基准将我们的模型与多种方法进行评估,以展示我们方法的有效性。