The Keyword Spotting (KWS) task involves continuous audio stream monitoring to detect predefined words, requiring low energy devices for continuous processing. Neuromorphic devices effectively address this energy challenge. However, the general neuromorphic KWS pipeline, from microphone to Spiking Neural Network (SNN), entails multiple processing stages. Leveraging the popularity of Pulse Density Modulation (PDM) microphones in modern devices and their similarity to spiking neurons, we propose a direct microphone-to-SNN connection. This approach eliminates intermediate stages, notably reducing computational costs. The system achieved an accuracy of 91.54\% on the Google Speech Command (GSC) dataset, surpassing the state-of-the-art for the Spiking Speech Command (SSC) dataset which is a bio-inspired encoded GSC. Furthermore, the observed sparsity in network activity and connectivity indicates potential for remarkably low energy consumption in a neuromorphic device implementation.
翻译:关键词检测任务涉及对连续音频流进行监测以检测预定义词汇,需要低能耗设备实现持续处理。神经形态设备能有效应对这一能耗挑战。然而,从麦克风到脉冲神经网络的通用神经形态关键词检测流程包含多个处理阶段。借助脉冲密度调制麦克风在现代设备中的普及性及其与脉冲神经元的相似性,我们提出了一种从麦克风到脉冲神经网络的直接连接方法。该方案消除了中间处理环节,显著降低了计算成本。该系统在Google语音指令数据集上实现了91.54%的准确率,超越了基于生物启发编码的脉冲语音指令数据集的最先进水平。此外,观测到的网络活动与连接稀疏性表明,该神经形态设备实施方案具有实现极低能耗的潜力。