Deep neural networks use skip connections to improve training convergence. However, these skip connections are costly in hardware, requiring extra buffers and increasing on- and off-chip memory utilization and bandwidth requirements. In this paper, we show that skip connections can be optimized for hardware when tackled with a hardware-software codesign approach. We argue that while a network's skip connections are needed for the network to learn, they can later be removed or shortened to provide a more hardware efficient implementation with minimal to no accuracy loss. We introduce Tailor, a codesign tool whose hardware-aware training algorithm gradually removes or shortens a fully trained network's skip connections to lower their hardware cost. Tailor improves resource utilization by up to 34% for BRAMs, 13% for FFs, and 16% for LUTs for on-chip, dataflow-style architectures. Tailor increases performance by 30% and reduces memory bandwidth by 45% for a 2D processing element array architecture.
翻译:深度神经网络利用跳跃连接来改善训练收敛性。然而,这些跳跃连接在硬件上代价高昂,需要额外的缓冲区,并增加片上和片外内存利用率及带宽需求。在本文中,我们证明,通过硬件-软件协同设计方法,跳跃连接可以在硬件上得到优化。我们认为,尽管网络的跳跃连接对于网络学习是必需的,但它们可以在训练后被移除或缩短,以提供更高效的硬件实现,同时仅带来极小甚至无精度损失。我们提出Tailor,一个协同设计工具,其硬件感知训练算法逐步移除或缩短已完全训练网络的跳跃连接,以降低其硬件成本。对于片上数据流式架构,Tailor可将BRAM、FF和LUT的资源利用率分别提升高达34%、13%和16%。对于二维处理单元阵列架构,Tailor可将性能提升30%,并将内存带宽减少45%。