This paper presents the speech restoration and enhancement system created by the 1024K team for the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. Our system consists of a generative adversarial network (GAN) in complex-domain for speech restoration and a fine-grained multi-band fusion module for speech enhancement. In the blind test set of SSI, the proposed system achieves an overall mean opinion score (MOS) of 3.49 based on ITU-T P.804 and a Word Accuracy Rate (WAcc) of 0.78 for the real-time track, as well as an overall P.804 MOS of 3.43 and a WAcc of 0.78 for the non-real-time track, ranking 1st in both tracks.
翻译:本文介绍了1024K团队为ICASSP 2024语音信号改进(SSI)挑战赛所构建的语音恢复与增强系统。我们的系统包含一个用于语音恢复的复数域生成对抗网络(GAN)和一个用于语音增强的细粒度多频带融合模块。在SSI盲测集上,所提出的系统在实时赛道中取得了基于ITU-T P.804标准的总体平均意见得分(MOS)3.49分及字准确率(WAcc)0.78,在非实时赛道中取得了P.804总体MOS 3.43分及WAcc 0.78,在两个赛道中均排名第一。