This paper introduces FlexNN, a Flexible Neural Network accelerator, which adopts agile design principles to enable versatile dataflows, enhancing energy efficiency. Unlike conventional convolutional neural network accelerator architectures that adhere to fixed dataflows (such as input, weight, output, or row stationary) for transferring activations and weights between storage and compute units, our design revolutionizes by enabling adaptable dataflows of any type through software configurable descriptors. Considering that data movement costs considerably outweigh compute costs from an energy perspective, the flexibility in dataflow allows us to optimize the movement per layer for minimal data transfer and energy consumption, a capability unattainable in fixed dataflow architectures. To further enhance throughput and reduce energy consumption in the FlexNN architecture, we propose a novel sparsity-based acceleration logic that utilizes fine-grained sparsity in both the activation and weight tensors to bypass redundant computations, thus optimizing the convolution engine within the hardware accelerator. Extensive experimental results underscore a significant enhancement in the performance and energy efficiency of FlexNN relative to existing DNN accelerators.
翻译:本文介绍了一种名为FlexNN的柔性神经网络加速器,它采用敏捷设计原理实现多种数据流以提高能效。不同于传统卷积神经网络加速器架构遵循固定数据流(如输入固定、权重固定、输出固定或行固定)来在存储与计算单元之间传输激活值和权重,我们的设计通过软件可配置的描述符实现了任意类型数据流的自适应调整。考虑到从能量角度来看,数据移动成本远高于计算成本,这种数据流灵活性使得我们能够针对每一层优化数据移动,从而最小化数据传输和能量消耗——这是固定数据流架构无法实现的能力。为进一步提升FlexNN架构的吞吐量并降低能耗,我们提出了一种新颖的基于稀疏性的加速逻辑,该逻辑利用激活值张量和权重张量中的细粒度稀疏性来跳过冗余计算,从而优化硬件加速器内的卷积引擎。大量实验结果表明,与现有深度学习加速器相比,FlexNN在性能和能效方面均有显著提升。