Efficient processing of continuous audio streams remains a key challenge for real-time and resource-constrained systems. This paper introduces a neuromorphic trigger for audio event detection, based on a spiking neural network (SNN) that selectively gates input to downstream models. The proposed trigger acts as a low-cost front-end, identifying salient audio segments and forwarding only these to a more computationally intensive model for tasks such as classification. The trigger is implemented as a lightweight fully connected SNN and evaluated on two representative tasks: Anomalous Sound Detection (ASD) and Sound Event Detection (SED). For ASD, the trigger achieves a one-second segment-based F1 score of 0.97 on a class-agnostic form of the URBAN-SED dataset, demonstrating high reliability in identifying relevant audio regions. For SED, the trigger is combined with the Dang classifier on the DCASE 2017 Challenge Task 2 dataset, showing a potential $42.6\times$ reduction in FLOPs while reducing the lower bound of the event-based error rate from 0.41 to 0.25. These results highlight the potential of neuromorphic triggers as real-time, energy-efficient front-end filters, enabling substantial reductions in computational cost.
翻译:持续音频流的有效处理仍是实时和资源受限系统面临的关键挑战。本文提出一种基于脉冲神经网络(SNN)的神经形态触发器,用于音频事件检测,该触发器能够选择性地控制输入进入下游模型。该触发器作为低成本前端,识别显著音频片段,仅将这些片段转发至计算密集型模型(如分类任务)。触发器采用轻量级全连接SNN实现,并在两个代表性任务上进行了评估:异常声音检测(ASD)和声音事件检测(SED)。在ASD任务中,触发器在URBAN-SED数据集的类别无关版本上实现了基于1秒片段F1分数0.97,展示了识别相关音频区域的高可靠性。在SED任务中,触发器与Dang分类器结合,在DCASE 2017挑战赛任务2数据集上,实现了潜在的42.6倍FLOPs缩减,同时将基于事件错误率的下界从0.41降至0.25。这些结果凸显了神经形态触发器作为实时、节能前端滤波器的潜力,能够显著降低计算成本。