Brain-inspired spiking neural networks (SNNs) have demonstrated great potential for temporal signal processing. However, their performance in speech processing remains limited due to the lack of an effective auditory front-end. To address this limitation, we introduce Spiking-LEAF, a learnable auditory front-end meticulously designed for SNN-based speech processing. Spiking-LEAF combines a learnable filter bank with a novel two-compartment spiking neuron model called IHC-LIF. The IHC-LIF neurons draw inspiration from the structure of inner hair cells (IHC) and they leverage segregated dendritic and somatic compartments to effectively capture multi-scale temporal dynamics of speech signals. Additionally, the IHC-LIF neurons incorporate the lateral feedback mechanism along with spike regularization loss to enhance spike encoding efficiency. On keyword spotting and speaker identification tasks, the proposed Spiking-LEAF outperforms both SOTA spiking auditory front-ends and conventional real-valued acoustic features in terms of classification accuracy, noise robustness, and encoding efficiency.
翻译:受大脑启发的脉冲神经网络(SNNs)在时序信号处理方面展现出巨大潜力。然而,由于缺乏有效的听觉前端,其在语音处理中的性能仍受限制。为解决这一局限,我们提出了Spiking-LEAF——一种专为基于SNN的语音处理精心设计的可学习听觉前端。Spiking-LEAF将可学习滤波器组与一种名为IHC-LIF的新型双区室脉冲神经元模型相结合。IHC-LIF神经元受内毛细胞(IHC)结构启发,利用分离的树突和胞体区室有效捕捉语音信号的多尺度时序动态特性。此外,IHC-LIF神经元引入侧向反馈机制与脉冲正则化损失,以提升脉冲编码效率。在关键词检测和说话人识别任务中,所提出的Spiking-LEAF在分类准确率、噪声鲁棒性和编码效率方面均优于最先进的SNN听觉前端和传统的实值声学特征。