Computational complexity is critical when deploying deep learning-based speech denoising models for on-device applications. Most prior research focused on optimizing model architectures to meet specific computational cost constraints, often creating distinct neural network architectures for different complexity limitations. This study conducts complexity scaling for speech denoising tasks, aiming to consolidate models with various complexities into a unified architecture. We present a Multi-Path Transform-based (MPT) architecture to handle both low- and high-complexity scenarios. A series of MPT networks present high performance covering a wide range of computational complexities on the DNS challenge dataset. Moreover, inspired by the scaling experiments in natural language processing, we explore the empirical relationship between model performance and computational cost on the denoising task. As the complexity number of multiply-accumulate operations (MACs) is scaled from 50M/s to 15G/s on MPT networks, we observe a linear increase in the values of PESQ-WB and SI-SNR, proportional to the logarithm of MACs, which might contribute to the understanding and application of complexity scaling in speech denoising tasks.
翻译:计算复杂度在将基于深度学习的语音去噪模型部署到设备端应用时至关重要。以往的研究多聚焦于优化模型架构以满足特定计算成本约束,常常针对不同复杂度限制设计不同的神经网络结构。本研究针对语音去噪任务开展复杂度缩放研究,旨在将不同复杂度的模型整合为统一架构。我们提出了一种基于多路径变换(MPT)的架构,以应对低复杂度与高复杂度场景。一系列MPT网络在DNS挑战数据集上展现出涵盖广泛计算复杂度的高性能。此外,受自然语言处理中缩放实验的启发,我们探索了去噪任务中模型性能与计算成本之间的经验关系。当MPT网络的计算复杂度(乘累加操作数MACs)从50M/s缩放至15G/s时,我们观察到PESQ-WB和SI-SNR的值随MACs的对数呈线性增长,这有助于理解并应用语音去噪任务中的复杂度缩放方法。