Spiking Neural Networks (SNNs) offer a promising approach to reduce energy consumption and computational demands, making them particularly beneficial for embedded machine learning in edge applications. However, data from conventional digital sensors must first be converted into spike trains to be processed using neuromorphic computing technologies. The classification of environmental sounds presents unique challenges due to the high variability of frequencies, background noise, and overlapping acoustic events. Despite these challenges, most studies on spike-based audio encoding focus on speech processing, leaving non-speech environmental sounds underexplored. In this work, we conduct a comprehensive comparison of widely used spike encoding techniques, evaluating their effectiveness on the ESC-10 dataset. By understanding the impact of encoding choices on environmental sound processing, researchers and practitioners can select the most suitable approach for real-world applications such as smart surveillance, environmental monitoring, and industrial acoustic analysis. This study serves as a benchmark for spike encoding in environmental sound classification, providing a foundational reference for future research in neuromorphic audio processing.
翻译:脉冲神经网络(SNNs)为降低能耗和计算需求提供了一种前景广阔的方法,使其特别适用于边缘应用中的嵌入式机器学习。然而,来自传统数字传感器的数据必须首先转换为脉冲序列,才能利用神经形态计算技术进行处理。环境声音的分类因频率的高度可变性、背景噪声以及重叠的声学事件而面临独特的挑战。尽管存在这些挑战,大多数基于脉冲的音频编码研究仍集中于语音处理,导致非语音环境声音的研究相对不足。本研究对广泛使用的脉冲编码技术进行了全面比较,并在ESC-10数据集上评估了它们的有效性。通过理解编码选择对环境声音处理的影响,研究人员和从业者可以为实际应用(如智能监控、环境监测和工业声学分析)选择最合适的方法。本研究为环境声音分类中的脉冲编码提供了基准,为未来神经形态音频处理的研究奠定了基础参考。