Convolutional neural network (CNN) accelerators implemented on Field-Programmable Gate Arrays (FPGAs) are typically designed with a primary focus on maximizing performance, often measured in giga-operations per second (GOPS). However, real-life embedded deep learning (DL) applications impose multiple constraints related to latency, power consumption, area, and cost. This work presents a hardware-software (HW/SW) co-design methodology in which a CNN accelerator is described using high-level synthesis (HLS) tools that ease the parameterization of the design, facilitating more effective optimizations across multiple design constraints. Our experimental results demonstrate that the proposed design methodology is able to outperform non-parameterized design approaches, and it can be easily extended to other types of DL applications.
翻译:在可编程门阵列(FPGAs)上实现的卷积神经网络(CNN)加速器,其设计通常主要侧重于最大化性能,常以每秒千兆操作(GOPS)来衡量。然而,现实中的嵌入式深度学习(DL)应用在延迟、功耗、面积和成本方面存在多重约束。本文提出了一种硬件-软件(HW/SW)协同设计方法,其中使用高层次综合(HLS)工具来描述CNN加速器,这简化了设计的参数化过程,有助于在多个设计约束条件下实现更有效的优化。我们的实验结果表明,所提出的设计方法能够超越非参数化的设计方法,并且可以轻松扩展到其他类型的DL应用中。