To process sensor data in the Internet of Things(IoTs), embedded deep learning for 1-dimensional data is an important technique. In the past, CNNs were frequently used because they are simple to optimise for special embedded hardware such as FPGAs. This work proposes a novel LSTM cell optimisation aimed at energy-efficient inference on end devices. Using the traffic speed prediction as a case study, a vanilla LSTM model with the optimised LSTM cell achieves 17534 inferences per second while consuming only 3.8 $\mu$J per inference on the FPGA \textit{XC7S15} from \textit{Spartan-7} family. It achieves at least 5.4$\times$ faster throughput and 1.37$\times$ more energy efficient than existing approaches.
翻译:物联网中传感器数据的处理需要借助面向一维数据的嵌入式深度学习技术。过去,卷积神经网络(CNN)因其易于针对FPGA等特定嵌入式硬件进行优化而得到广泛应用。本研究提出了一种面向终端设备能效推理的新型LSTM单元优化方案。以交通速度预测为例,采用优化后LSTM单元的原始LSTM模型在SPARTAN-7系列FPGA XC7S15上实现了每秒17534次推理,且每次推理仅消耗3.8 μJ能量。与现有方法相比,该方法吞吐量提升至少5.4倍,能效提升至少1.37倍。