HARFLOW3D: A Latency-Oriented 3D-CNN Accelerator Toolflow for HAR on FPGA Devices

For Human Action Recognition tasks (HAR), 3D Convolutional Neural Networks have proven to be highly effective, achieving state-of-the-art results. This study introduces a novel streaming architecture based toolflow for mapping such models onto FPGAs considering the model's inherent characteristics and the features of the targeted FPGA device. The HARFLOW3D toolflow takes as input a 3D CNN in ONNX format and a description of the FPGA characteristics, generating a design that minimizes the latency of the computation. The toolflow is comprised of a number of parts, including i) a 3D CNN parser, ii) a performance and resource model, iii) a scheduling algorithm for executing 3D models on the generated hardware, iv) a resource-aware optimization engine tailored for 3D models, v) an automated mapping to synthesizable code for FPGAs. The ability of the toolflow to support a broad range of models and devices is shown through a number of experiments on various 3D CNN and FPGA system pairs. Furthermore, the toolflow has produced high-performing results for 3D CNN models that have not been mapped to FPGAs before, demonstrating the potential of FPGA-based systems in this space. Overall, HARFLOW3D has demonstrated its ability to deliver competitive latency compared to a range of state-of-the-art hand-tuned approaches being able to achieve up to 5$\times$ better performance compared to some of the existing works.

翻译：针对人体动作识别（HAR）任务，3D卷积神经网络已被证明具有极高有效性，并取得了最先进的成果。本研究提出一种基于流式架构的新型工具流，用于将此类模型映射至FPGA，同时考虑模型固有特性与目标FPGA设备特征。HARFLOW3D工具流以ONNX格式的3D CNN模型与FPGA特性描述为输入，生成能够最小化计算延迟的设计方案。该工具流包含以下组件：i) 3D CNN解析器、ii) 性能与资源模型、iii) 用于在生成硬件上执行3D模型的调度算法、iv) 面向3D模型的资源感知优化引擎、v) 面向FPGA的可综合代码自动映射模块。通过多组不同3D CNN与FPGA系统组合的实验，验证了该工具流对广泛模型与设备的支持能力。此外，该工具流已为首次映射至FPGA的3D CNN模型产出高性能结果，充分展现了基于FPGA的系统在此领域的潜力。总体而言，HARFLOW3D在延迟指标上可与多种最先进的人工调优方案竞争，相较于部分现有工作，性能提升最高可达5倍。

相关内容

FPGA

关注 18

FPGA：ACM/SIGDA International Symposium on Field-Programmable Gate Arrays。 Explanation：ACM/SIGDA现场可编程门阵列国际研讨会。 Publisher：ACM。 SIT： http://dblp.uni-trier.de/db/conf/fpga/

【CVPR 2023】虚拟稀疏卷积的多模态三维目标检测

专知会员服务

25+阅读 · 2023年3月11日

面向FPGA的布局与布线技术研究综述

专知会员服务

26+阅读 · 2022年9月3日