This paper introduces our repairing and denoising network (RaD-Net) for the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. We extend our previous framework based on a two-stage network and propose an upgraded model. Specifically, we replace the repairing network with COM-Net from TEA-PSE. In addition, multi-resolution discriminators and multi-band discriminators are adopted in the training stage. Finally, we use a three-step training strategy to optimize our model. We submit two models with different sets of parameters to meet the RTF requirement of the two tracks. According to the official results, the proposed systems rank 2nd in track 1 and 3rd in track 2.
翻译:摘要:本文介绍了我们为ICASSP 2024语音信号增强(SSI)挑战赛提出的修复与去噪网络(RaD-Net)。我们对先前基于两阶段网络的框架进行了扩展,并提出了升级模型。具体而言,我们使用TEA-PSE中的COM-Net替换了修复网络。此外,在训练阶段采用了多分辨率鉴别器与多频段鉴别器。最后,我们采用三步训练策略优化模型。针对两个赛道的RTF要求,我们提交了两组不同参数的模型。根据官方结果,所提系统在赛道1和赛道2中分别位列第二名和第三名。