Speech enhancement seeks to extract clean speech from noisy signals. Traditional deep learning methods face two challenges: efficiently using information in long speech sequences and high computational costs. To address these, we introduce the Spiking Structured State Space Model (Spiking-S4). This approach merges the energy efficiency of Spiking Neural Networks (SNN) with the long-range sequence modeling capabilities of Structured State Space Models (S4), offering a compelling solution. Evaluation on the DNS Challenge and VoiceBank+Demand Datasets confirms that Spiking-S4 rivals existing Artificial Neural Network (ANN) methods but with fewer computational resources, as evidenced by reduced parameters and Floating Point Operations (FLOPs).
翻译:语音增强旨在从含噪信号中提取纯净语音。传统深度学习方法面临两大挑战:长语音序列中信息的高效利用与高昂的计算成本。为解决这些问题,我们提出了基于脉冲的结构化状态空间模型(Spiking-S4)。该方法融合了脉冲神经网络(SNN)的能效优势与结构化状态空间模型(S4)的长距离序列建模能力,提供了一种引人注目的解决方案。在DNS挑战赛与VoiceBank+Demand数据集上的评估证实,Spiking-S4在计算资源更少的情况下(如参数数量与浮点运算次数(FLOPs)的减少所示)可与现有的人工神经网络(ANN)方法相媲美。