In this paper, we present a causal speech signal improvement system that is designed to handle different types of distortions. The method is based on a generative diffusion model which has been shown to work well in scenarios with missing data and non-linear corruptions. To guarantee causal processing, we modify the network architecture of our previous work and replace global normalization with causal adaptive gain control. We generate diverse training data containing a broad range of distortions. This work was performed in the context of an "ICASSP Signal Processing Grand Challenge" and submitted to the non-real-time track of the "Speech Signal Improvement Challenge 2023", where it was ranked fifth.
翻译:本文提出了一种因果语音信号增强系统,旨在处理不同类型的失真。该方法基于生成扩散模型,该模型已被证明在数据缺失和非线性损坏场景中表现良好。为确保因果处理,我们修改了先前工作的网络架构,并用因果自适应增益控制替代了全局归一化。我们生成了包含多种失真的多样化训练数据。本研究是在“ICASSP信号处理大挑战”背景下开展的,并提交至“2023年语音信号增强挑战赛”的非实时赛道,最终排名第五。