Speech enhancement (SE) improves communication in noisy environments, affecting areas such as automatic speech recognition, hearing aids, and telecommunications. With these domains typically being power-constrained and event-based while requiring low latency, neuromorphic algorithms in the form of spiking neural networks (SNNs) have great potential. Yet, current effective SNN solutions require a contextual sampling window imposing substantial latency, typically around 32ms, too long for many applications. Inspired by Dual-Path Spiking Neural Networks (DPSNNs) in classical neural networks, we develop a two-phase time-domain streaming SNN framework -- the Dual-Path Spiking Neural Network (DPSNN). In the DPSNN, the first phase uses Spiking Convolutional Neural Networks (SCNNs) to capture global contextual information, while the second phase uses Spiking Recurrent Neural Networks (SRNNs) to focus on frequency-related features. In addition, the regularizer suppresses activation to further enhance energy efficiency of our DPSNNs. Evaluating on the VCTK and Intel DNS Datasets, we demonstrate that our approach achieves the very low latency (approximately 5ms) required for applications like hearing aids, while demonstrating excellent signal-to-noise ratio (SNR), perceptual quality, and energy efficiency.
翻译:语音增强(SE)旨在改善嘈杂环境中的通信质量,其应用领域包括自动语音识别、助听器和电信等。这些领域通常受限于功耗且基于事件驱动,同时要求低延迟,因此以脉冲神经网络(SNNs)形式存在的神经形态算法展现出巨大潜力。然而,当前有效的SNN解决方案需要依赖上下文采样窗口,这会引入显著延迟(通常约为32毫秒),对于许多应用而言过长。受经典神经网络中双路径脉冲神经网络(DPSNNs)的启发,我们开发了一种双阶段时域流式SNN框架——双路径脉冲神经网络(DPSNN)。在DPSNN中,第一阶段使用脉冲卷积神经网络(SCNNs)捕捉全局上下文信息,而第二阶段使用脉冲循环神经网络(SRNNs)聚焦于频率相关特征。此外,通过正则化器抑制激活以进一步提升DPSNN的能量效率。在VCTK和Intel DNS数据集上的评估表明,我们的方法实现了助听器等应用所需的极低延迟(约5毫秒),同时展现出优异的信噪比(SNR)、感知质量和能量效率。