Exploiting temporal parallelism for LSTM Autoencoder acceleration on FPGA

Recurrent Neural Networks (RNNs) are vital for sequential data processing. Long Short-Term Memory Autoencoders (LSTM-AEs) are particularly effective for unsupervised anomaly detection in time-series data. However, inherent sequential dependencies limit parallel computation. While previous work has explored FPGA-based acceleration for LSTM networks, efforts have typically focused on optimizing a single LSTM layer at a time. We introduce a novel FPGA-based accelerator using a dataflow architecture that exploits temporal parallelism for concurrent multi-layer processing of different timesteps within sequences. Experimental evaluations on four representative LSTM-AE models with varying widths and depths, implemented on a Zynq UltraScale+ MPSoC FPGA, demonstrate significant advantages over CPU (Intel Xeon Gold 5218R) and GPU (NVIDIA V100) implementations. Our accelerator achieves latency speedups up to 79.6x vs. CPU and 18.2x vs. GPU, alongside energy-per-timestep reductions of up to 1722x vs. CPU and 59.3x vs. GPU. These results, including superior network depth scalability, highlight our approach's potential for high-performance, real-time, power-efficient LSTM-AE-based anomaly detection on FPGAs.

翻译：循环神经网络（RNNs）对于序列数据处理至关重要。长短期记忆自编码器（LSTM-AEs）在时间序列数据的无监督异常检测方面尤为有效。然而，其固有的序列依赖性限制了并行计算。虽然先前的工作已探索了基于FPGA的LSTM网络加速，但这些努力通常集中于一次优化单个LSTM层。我们提出了一种新颖的基于FPGA的加速器，该加速器采用数据流架构，利用时间并行性对序列内不同时间步进行并发多层处理。在Zynq UltraScale+ MPSoC FPGA上实现四种具有不同宽度和深度的代表性LSTM-AE模型，并进行实验评估，结果表明其相较于CPU（Intel Xeon Gold 5218R）和GPU（NVIDIA V100）实现具有显著优势。我们的加速器实现了高达79.6倍（相对于CPU）和18.2倍（相对于GPU）的延迟加速，同时每个时间步的能耗降低高达1722倍（相对于CPU）和59.3倍（相对于GPU）。这些结果，包括优异的网络深度可扩展性，突显了我们所提方法在FPGA上实现高性能、实时、高能效的基于LSTM-AE的异常检测方面的潜力。

相关内容

长短期记忆网络

关注 120

长短期记忆网络(LSTM)是一种用于深度学习领域的人工回归神经网络(RNN)结构。与标准的前馈神经网络不同，LSTM具有反馈连接。它不仅可以处理单个数据点(如图像)，还可以处理整个数据序列(如语音或视频)。例如，LSTM适用于未分段、连接的手写识别、语音识别、网络流量或IDSs(入侵检测系统)中的异常检测等任务。

时间序列如何用自监督？浙大最新《自监督学习时间序列分析：分类、进展与展望》

专知会员服务

72+阅读 · 2023年6月24日

深度神经网络 FPGA 设计进展、实现与展望

专知会员服务

59+阅读 · 2022年3月26日

深度神经网络FPGA设计进展、实现与展望

专知会员服务

36+阅读 · 2022年3月21日

【深度神经网络加速器的硬件近似技术综述】Hardware Approximate Techniques for Deep Neural Network Accelerators: A Survey

专知会员服务

16+阅读 · 2022年3月17日