For Human Action Recognition tasks (HAR), 3D Convolutional Neural Networks have proven to be highly effective, achieving state-of-the-art results. This study introduces a novel streaming architecture based toolflow for mapping such models onto FPGAs considering the model's inherent characteristics and the features of the targeted FPGA device. The HARFLOW3D toolflow takes as input a 3D CNN in ONNX format and a description of the FPGA characteristics, generating a design that minimizes the latency of the computation. The toolflow is comprised of a number of parts, including i) a 3D CNN parser, ii) a performance and resource model, iii) a scheduling algorithm for executing 3D models on the generated hardware, iv) a resource-aware optimization engine tailored for 3D models, v) an automated mapping to synthesizable code for FPGAs. The ability of the toolflow to support a broad range of models and devices is shown through a number of experiments on various 3D CNN and FPGA system pairs. Furthermore, the toolflow has produced high-performing results for 3D CNN models that have not been mapped to FPGAs before, demonstrating the potential of FPGA-based systems in this space. Overall, HARFLOW3D has demonstrated its ability to deliver competitive latency compared to a range of state-of-the-art hand-tuned approaches being able to achieve up to 5$\times$ better performance compared to some of the existing works.
翻译:针对人体动作识别(HAR)任务,3D卷积神经网络已被证明具有极高有效性,并取得了最先进的成果。本研究提出一种基于流式架构的新型工具流,用于将此类模型映射至FPGA,同时考虑模型固有特性与目标FPGA设备特征。HARFLOW3D工具流以ONNX格式的3D CNN模型与FPGA特性描述为输入,生成能够最小化计算延迟的设计方案。该工具流包含以下组件:i) 3D CNN解析器、ii) 性能与资源模型、iii) 用于在生成硬件上执行3D模型的调度算法、iv) 面向3D模型的资源感知优化引擎、v) 面向FPGA的可综合代码自动映射模块。通过多组不同3D CNN与FPGA系统组合的实验,验证了该工具流对广泛模型与设备的支持能力。此外,该工具流已为首次映射至FPGA的3D CNN模型产出高性能结果,充分展现了基于FPGA的系统在此领域的潜力。总体而言,HARFLOW3D在延迟指标上可与多种最先进的人工调优方案竞争,相较于部分现有工作,性能提升最高可达5倍。