Thanks to Deep Neural Networks (DNNs), the accuracy of Keyword Spotting (KWS) has made substantial progress. However, as KWS systems are usually implemented on edge devices, energy efficiency becomes a critical requirement besides performance. Here, we take advantage of spiking neural networks' energy efficiency and propose an end-to-end lightweight KWS model. The model consists of two innovative modules: 1) Global-Local Spiking Convolution (GLSC) module and 2) Bottleneck-PLIF module. Compared to the hand-crafted feature extraction methods, the GLSC module achieves speech feature extraction that is sparser, more energy-efficient, and yields better performance. The Bottleneck-PLIF module further processes the signals from GLSC with the aim to achieve higher accuracy with fewer parameters. Extensive experiments are conducted on the Google Speech Commands Dataset (V1 and V2). The results show our method achieves competitive performance among SNN-based KWS models with fewer parameters.
翻译:得益于深度神经网络的发展,关键词检测的准确性已取得显著进步。然而,由于关键词检测系统通常在边缘设备上部署,除性能外,能效成为关键需求。本文利用脉冲神经网络的高能效特性,提出一种端到端的轻量化关键词检测模型。该模型包含两个创新模块:1)全局-局部脉冲卷积模块;2)瓶颈-参数化泄漏积分发放模块。相较于手工特征提取方法,全局-局部脉冲卷积模块实现了更稀疏、更节能且性能更优的语音特征提取。瓶颈-参数化泄漏积分发放模块进一步处理来自全局-局部脉冲卷积模块的信号,旨在以更少的参数实现更高的准确率。我们在Google语音命令数据集(V1和V2版本)上进行了大量实验。结果表明,我们的方法在基于脉冲神经网络的关键词检测模型中,以更少的参数取得了具有竞争力的性能。