Noise suppression and echo cancellation are critical in speech enhancement and essential for smart devices and real-time communication. Deployed in voice processing front-ends and edge devices, these algorithms must ensure efficient real-time inference with low computational demands. Traditional edge-based noise suppression often uses MSE-based amplitude spectrum mask training, but this approach has limitations. We introduce a novel projection loss function, diverging from MSE, to enhance noise suppression. This method uses projection techniques to isolate key audio components from noise, significantly improving model performance. For echo cancellation, the function enables direct predictions on LAEC pre-processed outputs, substantially enhancing performance. Our noise suppression model achieves near state-of-the-art results with only 3.1M parameters and 0.4GFlops/s computational load. Moreover, our echo cancellation model outperforms replicated industry-leading models, introducing a new perspective in speech enhancement.
翻译:噪声抑制与回声消除是语音增强中的关键任务,对智能设备和实时通信至关重要。这些算法部署在语音处理前端和边缘设备中时,必须确保在低计算需求下实现高效的实时推理。传统的基于边缘的噪声抑制通常采用基于均方误差的幅度谱掩码训练方法,但这种方法存在局限性。我们提出了一种不同于均方误差的新型投影损失函数来增强噪声抑制能力。该方法利用投影技术从噪声中分离出关键音频成分,显著提升了模型性能。在回声消除任务中,该函数能够对LAEC预处理输出进行直接预测,大幅提高性能。我们的噪声抑制模型仅需3.1M参数和0.4GFlops/s计算负载即可实现接近最优的结果。此外,我们的回声消除模型优于复现的业界领先模型,为语音增强领域引入了全新视角。