The escalating challenges of managing vast sensor-generated data, particularly in audio applications, necessitate innovative solutions. Current systems face significant computational and storage demands, especially in real-time applications like gunshot detection systems (GSDS), and the proliferation of edge sensors exacerbates these issues. This paper proposes a groundbreaking approach with a near-sensor model tailored for intelligent audio-sensing frameworks. Utilizing a Fast Fourier Transform (FFT) module, convolutional neural network (CNN) layers, and HyperDimensional Computing (HDC), our model excels in low-energy, rapid inference, and online learning. It is highly adaptable for efficient ASIC design implementation, offering superior energy efficiency compared to conventional embedded CPUs or GPUs, and is compatible with the trend of shrinking microphone sensor sizes. Comprehensive evaluations at both software and hardware levels underscore the model's efficacy. Software assessments through detailed ROC curve analysis revealed a delicate balance between energy conservation and quality loss, achieving up to 82.1% energy savings with only 1.39% quality loss. Hardware evaluations highlight the model's commendable energy efficiency when implemented via ASIC design, especially with the Google Edge TPU, showcasing its superiority over prevalent embedded CPUs and GPUs.
翻译:管理海量传感器生成数据(尤其在音频应用领域)的挑战日益严峻,亟需创新解决方案。现有系统面临巨大的计算与存储需求,在枪声检测系统(GSDS)等实时应用中尤为突出,而边缘传感器的激增进一步加剧了这些问题。本文提出一种突破性的近传感器模型,专为智能音频感知框架设计。通过融合快速傅里叶变换(FFT)模块、卷积神经网络(CNN)层与超维计算(HDC),该模型在低能耗、快速推理和在线学习方面表现卓越。其高度适配高效ASIC设计实现,相比传统嵌入式CPU或GPU具有更优的能效比,并能顺应麦克风传感器尺寸微型化的发展趋势。软件与硬件层面的综合评估验证了模型的有效性:软件端通过详尽的ROC曲线分析揭示了节能与质量损失间的精妙平衡,在仅损失1.39%质量的前提下最高可实现82.1%的节能效果;硬件端评估表明,采用ASIC设计(特别是谷歌Edge TPU)实现的模型能效显著,展现出优于主流嵌入式CPU与GPU的卓越性能。