To process sensor data in the Internet of Things(IoTs), embedded deep learning for 1-dimensional data is an important technique. In the past, CNNs were frequently used because they are simple to optimise for special embedded hardware such as FPGAs. This work proposes a novel LSTM cell optimisation aimed at energy-efficient inference on end devices. Using the traffic speed prediction as a case study, a vanilla LSTM model with the optimised LSTM cell achieves 17534 inferences per second while consuming only 3.8 $\mu$J per inference on the FPGA XC7S15 from Spartan-7 family. It achieves at least 5.4$\times$ faster throughput and 1.37$\times$ more energy efficient than existing approaches.
翻译:在物联网(IoTs)中处理传感器数据时,针对一维数据的嵌入式深度学习是一项重要技术。过去,卷积神经网络(CNN)因其易于针对FPGA等专用嵌入式硬件进行优化而被频繁使用。本文提出了一种新的LSTM单元优化方法,旨在实现终端设备上的低能耗推理。以交通速度预测为案例,采用优化后LSTM单元的简单LSTM模型在Spartan-7系列的FPGA XC7S15上每秒可完成17534次推理,且每次推理仅消耗3.8微焦耳能量。与现有方法相比,其吞吐量至少提升5.4倍,能效至少提升1.37倍。