In this paper, we present a method for fine-tuning models trained on the Deep Noise Suppression (DNS) 2020 Challenge to improve their performance on Voice over Internet Protocol (VoIP) applications. Our approach involves adapting the DNS 2020 models to the specific acoustic characteristics of VoIP communications, which includes distortion and artifacts caused by compression, transmission, and platform-specific processing. To this end, we propose a multi-task learning framework for VoIP-DNS that jointly optimizes noise suppression and VoIP-specific acoustics for speech enhancement. We evaluate our approach on a diverse VoIP scenarios and show that it outperforms both industry performance and state-of-the-art methods for speech enhancement on VoIP applications. Our results demonstrate the potential of models trained on DNS-2020 to be improved and tailored to different VoIP platforms using VoIP-DNS, whose findings have important applications in areas such as speech recognition, voice assistants, and telecommunication.
翻译:本文提出一种方法,对在深度噪声抑制(DNS)2020挑战赛上训练的模型进行微调,以提升其在互联网协议语音(VoIP)应用中的性能。我们的方法涉及使DNS 2020模型适配VoIP通信特有的声学特性,包括由压缩、传输及平台特定处理造成的失真与伪影。为此,我们提出一个用于VoIP-DNS的多任务学习框架,该框架联合优化噪声抑制与VoIP特定声学特性以增强语音。我们在多种VoIP场景下评估了该方法,结果表明其在VoIP应用的语音增强方面超越了行业性能与现有最优方法。我们的结果证明了利用VoIP-DNS可将基于DNS-2020训练的模型改进并定制到不同VoIP平台的潜力,该发现对于语音识别、语音助手及电信等领域具有重要应用价值。