For Human Action Recognition tasks (HAR), 3D Convolutional Neural Networks have proven to be highly effective, achieving state-of-the-art results. This study introduces a novel streaming architecture based toolflow for mapping such models onto FPGAs considering the model's inherent characteristics and the features of the targeted FPGA device. The HARFLOW3D toolflow takes as input a 3D CNN in ONNX format and a description of the FPGA characteristics, generating a design that minimizes the latency of the computation. The toolflow is comprised of a number of parts, including i) a 3D CNN parser, ii) a performance and resource model, iii) a scheduling algorithm for executing 3D models on the generated hardware, iv) a resource-aware optimization engine tailored for 3D models, v) an automated mapping to synthesizable code for FPGAs. The ability of the toolflow to support a broad range of models and devices is shown through a number of experiments on various 3D CNN and FPGA system pairs. Furthermore, the toolflow has produced high-performing results for 3D CNN models that have not been mapped to FPGAs before, demonstrating the potential of FPGA-based systems in this space. Overall, HARFLOW3D has demonstrated its ability to deliver competitive latency compared to a range of state-of-the-art hand-tuned approaches being able to achieve up to 5$\times$ better performance compared to some of the existing works.
翻译:在人体行为识别(HAR)任务中,三维卷积神经网络已被证明具有卓越性能,达到了当前最优水平。本研究提出了一种新型流式架构工具流,通过综合考虑模型固有特性与目标FPGA设备特征,将此类模型映射至FPGA。HARFLOW3D工具流以ONNX格式的3D CNN模型和FPGA特征描述作为输入,生成可最小化计算延迟的设计方案。该工具流包含多个组件:i) 3D CNN解析器,ii) 性能与资源模型,iii) 在生成硬件上执行3D模型的调度算法,iv) 面向3D模型的资源感知优化引擎,v) 自动映射至FPGA可综合代码的映射机制。通过在多种3D CNN与FPGA系统组合上的实验,验证了该工具流对广泛模型和设备的支持能力。此外,该工具流首次成功将尚未被映射至FPGA的3D CNN模型实现高性能部署,充分展示了基于FPGA系统在该领域的潜力。总体而言,HARFLOW3D展现了其相较于一系列手工调优最优方法能提供具有竞争力的延迟表现,在部分现有工作中可实现高达5倍的性能提升。