StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation

Diffusion models have shown a great ability at bridging the performance gap between predictive and generative approaches for speech enhancement. We have shown that they may even outperform their predictive counterparts for non-additive corruption types or when they are evaluated on mismatched conditions. However, diffusion models suffer from a high computational burden, mainly as they require to run a neural network for each reverse diffusion step, whereas predictive approaches only require one pass. As diffusion models are generative approaches they may also produce vocalizing and breathing artifacts in adverse conditions. In comparison, in such difficult scenarios, predictive models typically do not produce such artifacts but tend to distort the target speech instead, thereby degrading the speech quality. In this work, we present a stochastic regeneration approach where an estimate given by a predictive model is provided as a guide for further diffusion. We show that the proposed approach uses the predictive model to remove the vocalizing and breathing artifacts while producing very high quality samples thanks to the diffusion model, even in adverse conditions. We further show that this approach enables to use lighter sampling schemes with fewer diffusion steps without sacrificing quality, thus lifting the computational burden by an order of magnitude. Source code and audio examples are available online (https://uhh.de/inf-sp-storm).

翻译：扩散模型在弥合语音增强中预测方法与生成方法之间的性能差距方面展现了卓越能力。已有研究表明，对于非加性失真类型或在失配条件下评估时，扩散模型甚至可能超越其对应的预测方法。然而，扩散模型存在计算负担高的缺陷，这主要源于每个反向扩散步骤均需运行神经网络，而预测方法仅需单次前向计算。作为生成方法，扩散模型在恶劣条件下还可能产生发声伪影和呼吸伪影。相比之下，在此类困难场景中，预测模型通常不会产生此类伪影，但会扭曲目标语音，从而降低语音质量。本文提出一种随机再生方法：将预测模型给出的估计值作为引导信号用于后续扩散过程。实验表明，所提方法借助预测模型消除发声与呼吸伪影，同时通过扩散模型生成高质量样本，即便在恶劣条件下亦然。我们进一步证明，该方法可在不牺牲质量的前提下使用更轻量化的采样方案（减少扩散步数），从而将计算负担降低一个数量级。源代码和音频示例已公开（https://uhh.de/inf-sp-storm）。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日