As the volume of data recorded by embedded edge sensors increases, particularly from neuromorphic devices producing discrete event streams, there is a growing need for hardware-aware neural architectures that enable efficient, low-latency, and energy-conscious local processing. We present an FPGA implementation of event-graph neural networks for audio processing. We utilise an artificial cochlea that converts time-series signals into sparse event data, reducing memory and computation costs. Our architecture was implemented on a SoC FPGA and evaluated on two open-source datasets. For classification task, our baseline floating-point model achieves 92.7% accuracy on SHD dataset - only 2.4% below the state of the art - while requiring over 10x and 67x fewer parameters. On SSC, our models achieve 66.9-71.0% accuracy. Compared to FPGA-based spiking neural networks, our quantised model reaches 92.3% accuracy, outperforming them by up to 19.3% while reducing resource usage and latency. For SSC, we report the first hardware-accelerated evaluation. We further demonstrate the first end-to-end FPGA implementation of event-audio keyword spotting, combining graph convolutional layers with recurrent sequence modelling. The system achieves up to 95% word-end detection accuracy, with only 10.53 microsecond latency and 1.18 W power consumption, establishing a strong benchmark for energy-efficient event-driven KWS.
翻译:随着嵌入式边缘传感器记录的数据量不断增加,特别是神经形态设备产生的离散事件流,对能够实现高效、低延迟且节能的本地处理的硬件感知神经架构的需求日益增长。我们提出了一种用于音频处理的事件图神经网络的FPGA实现方案。我们采用人工耳蜗将时间序列信号转换为稀疏事件数据,从而降低了内存和计算成本。该架构在SoC FPGA上实现,并在两个开源数据集上进行了评估。在分类任务中,我们的基线浮点模型在SHD数据集上达到了92.7%的准确率——仅比最先进水平低2.4%——同时所需参数量减少了10倍以上和67倍。在SSC数据集上,我们的模型达到了66.9-71.0%的准确率。与基于FPGA的脉冲神经网络相比,我们的量化模型达到了92.3%的准确率,以高达19.3%的优势超越前者,同时减少了资源使用和延迟。对于SSC,我们首次报告了硬件加速的评估结果。我们进一步展示了首个端到端的FPGA实现的事件音频关键词检测系统,结合了图卷积层与循环序列建模。该系统实现了高达95%的词尾检测准确率,延迟仅为10.53微秒,功耗为1.18瓦,为节能的事件驱动关键词检测建立了强有力的基准。