Long Short-term Memory Networks (LSTMs) are a vital Deep Learning technique suitable for performing on-device time series analysis on local sensor data streams of embedded devices. In this paper, we propose a new hardware accelerator design for LSTMs specially optimised for resource-scarce embedded Field Programmable Gate Arrays (FPGAs). Our design improves the execution speed and reduces energy consumption compared to related work. Moreover, it can be adapted to different situations using a number of optimisation parameters, such as the usage of DSPs or the implementation of activation functions. We present our key design decisions and evaluate the performance. Our accelerator achieves an energy efficiency of 11.89 GOP/s/W during a real-time inference with 32873 samples/s.
翻译:长短期记忆网络(LSTM)是一种重要的深度学习技术,适用于对嵌入式设备本地传感器数据流进行设备端时间序列分析。本文提出了一种专为资源受限的嵌入式现场可编程门阵列(FPGA)优化的新型LSTM硬件加速器设计。与现有工作相比,我们的设计提升了执行速度并降低了能耗。此外,通过多种优化参数(例如DSP的使用或激活函数的实现),该加速器可适应不同应用场景。我们阐述了关键设计决策并评估了性能。在实时推理中(采样率为32873样本/秒),该加速器实现了11.89 GOP/s/W的能效。