ATHEENA: A Toolflow for Hardware Early-Exit Network Automation

The continued need for improvements in accuracy, throughput, and efficiency of Deep Neural Networks has resulted in a multitude of methods that make the most of custom architectures on FPGAs. These include the creation of hand-crafted networks and the use of quantization and pruning to reduce extraneous network parameters. However, with the potential of static solutions already well exploited, we propose to shift the focus to using the varying difficulty of individual data samples to further improve efficiency and reduce average compute for classification. Input-dependent computation allows for the network to make runtime decisions to finish a task early if the result meets a confidence threshold. Early-Exit network architectures have become an increasingly popular way to implement such behaviour in software. We create: A Toolflow for Hardware Early-Exit Network Automation (ATHEENA), an automated FPGA toolflow that leverages the probability of samples exiting early from such networks to scale the resources allocated to different sections of the network. The toolflow uses the data-flow model of fpgaConvNet, extended to support Early-Exit networks as well as Design Space Exploration to optimize the generated streaming architecture hardware with the goal of increasing throughput/reducing area while maintaining accuracy. Experimental results on three different networks demonstrate a throughput increase of $2.00\times$ to $2.78\times$ compared to an optimized baseline network implementation with no early exits. Additionally, the toolflow can achieve a throughput matching the same baseline with as low as $46\%$ of the resources the baseline requires.

翻译：摘要：深度神经网络在精度、吞吐量和效率方面的持续改进需求催生了多种方法，这些方法充分利用FPGA上的定制架构。其中包括手工设计网络、采用量化和剪枝以减少冗余网络参数。然而，鉴于静态方案的潜力已被充分挖掘，我们提出将重心转向利用单个数据样本的难度差异，以进一步提升效率并降低分类任务的平均计算量。输入依赖型计算允许网络在运行时做出决策：若结果达到置信度阈值，则可提前完成任务。早退网络架构已成为在软件中实现此类行为的流行方法。我们创建了面向硬件早退网络的自动化工具流（ATHEENA），这是一种自动化的FPGA工具流，通过利用样本从这类网络中提前退出的概率，来缩放分配给网络不同部分的资源。该工具流基于fpgaConvNet的数据流模型进行扩展，以支持早退网络及设计空间探索，从而优化生成的流式架构硬件，目标是在保持精度的同时提高吞吐量或缩减面积。在三种不同网络上的实验结果表明，与未采用早退机制的最优基线网络实现相比，吞吐量提升了2.00倍至2.78倍。此外，该工具流可在仅需基线资源46%的条件下，达到与基线相同的吞吐量。