Speech enhancement seeks to extract clean speech from noisy signals. Traditional deep learning methods face two challenges: efficiently using information in long speech sequences and high computational costs. To address these, we introduce the Spiking Structured State Space Model (Spiking-S4). This approach merges the energy efficiency of Spiking Neural Networks (SNN) with the long-range sequence modeling capabilities of Structured State Space Models (S4), offering a compelling solution. Evaluation on the DNS Challenge and VoiceBank+Demand Datasets confirms that Spiking-S4 rivals existing Artificial Neural Network (ANN) methods but with fewer computational resources, as evidenced by reduced parameters and Floating Point Operations (FLOPs).
翻译:语音增强旨在从含噪信号中提取清晰语音。传统深度学习方法面临两大挑战:高效利用长语音序列信息以及高昂的计算成本。为解决这些问题,我们引入了脉冲结构状态空间模型(Spiking-S4)。该方法融合了脉冲神经网络(SNN)的能效优势与结构状态空间模型(S4)的长序列建模能力,提供了一种引人注目的解决方案。在DNS挑战赛和VoiceBank+Demand数据集上的评估证实,Spiking-S4在性能上与现有的人工神经网络(ANN)方法相媲美,但所需计算资源更少,这体现在更少的参数数量和浮点运算次数(FLOPs)上。