Speech enhancement (SE) is crucial for reliable communication devices or robust speech recognition systems. Although conventional artificial neural networks (ANN) have demonstrated remarkable performance in SE, they require significant computational power, along with high energy costs. In this paper, we propose a novel approach to SE using a spiking neural network (SNN) based on a U-Net architecture. SNNs are suitable for processing data with a temporal dimension, such as speech, and are known for their energy-efficient implementation on neuromorphic hardware. As such, SNNs are thus interesting candidates for real-time applications on devices with limited resources. The primary objective of the current work is to develop an SNN-based model with comparable performance to a state-of-the-art ANN model for SE. We train a deep SNN using surrogate-gradient-based optimization and evaluate its performance using perceptual objective tests under different signal-to-noise ratios and real-world noise conditions. Our results demonstrate that the proposed energy-efficient SNN model outperforms the Intel Neuromorphic Deep Noise Suppression Challenge (Intel N-DNS Challenge) baseline solution and achieves acceptable performance compared to an equivalent ANN model.
翻译:语音增强对于可靠的通信设备或鲁棒的语音识别系统至关重要。尽管传统人工神经网络在语音增强中表现出色,但其需要大量计算资源和高能耗成本。本文提出了一种基于U-Net架构的脉冲神经网络语音增强新方法。脉冲神经网络适用于处理具有时间维度的数据(如语音),并以其在神经形态硬件上的高能效实现而闻名。因此,脉冲神经网络成为资源受限设备上实时应用的有趣候选方案。本研究的主要目标是开发一种性能可媲美当前最优人工神经网络模型的脉冲神经网络语音增强模型。我们采用基于替代梯度的优化方法训练深层脉冲神经网络,并在不同信噪比和真实噪声条件下使用感知客观测试评估其性能。结果表明,本文提出的高能效脉冲神经网络模型优于Intel神经形态深度噪声抑制挑战赛基线方案,且与等效人工神经网络模型相比性能可接受。