The Convolutional Neural Network (CNN) has emerged as a powerful and versatile tool for artificial intelligence (AI) applications. Conventional computing architectures face challenges in meeting the demanding processing requirements of compute-intensive CNN applications, as they suffer from limited throughput and low utilization. To this end, specialized accelerators have been developed to speed up CNN computations. However, as we demonstrate in this paper via extensive design space exploration, different neural network models have different characteristics, which calls for different accelerator architectures and configurations to match their computing demand. We show that a one-size-fits-all fixed architecture does not guarantee optimal power/energy/performance trade-off. To overcome this challenge, this paper proposes ARMAN, a novel reconfigurable systolic-array-based accelerator architecture based on Monolithic 3D (M3D) technology for CNN inference. The proposed accelerator offers the flexibility to reconfigure among different scale-up or scale-out arrangements depending on the neural network structure, providing the optimal trade-off across power, energy, and performance for various neural network models. We demonstrate the effectiveness of our approach through evaluations of multiple benchmarks. The results demonstrate that the proposed accelerator exhibits up to 2x, 2.24x, 1.48x, and 2x improvements in terms of execution cycles, power, energy, and EDP respectively, over the non-configurable architecture.
翻译:卷积神经网络(CNN)已成为人工智能(AI)应用中强大且通用的工具。传统计算架构在满足计算密集型CNN应用的高处理需求时面临挑战,因其存在吞吐量受限和利用率低的问题。为此,研究人员开发了专用加速器以加速CNN计算。然而,正如本文通过广泛的设计空间探索所证明的,不同神经网络模型具有不同特性,需要匹配不同加速器架构与配置以满足其计算需求。研究表明,单一固定架构无法保证最佳功耗/能效/性能权衡。为克服这一挑战,本文提出ARMAN——一种基于单片3D(M3D)技术的新型可重构脉动阵列加速器架构,专用于CNN推理。该加速器可根据神经网络结构灵活重构为不同缩放或扩展配置,从而为各类神经网络模型提供功耗、能效与性能之间的最优权衡。我们通过多个基准测试验证了该方法的有效性。结果表明,相较于非可配置架构,所提加速器在执行周期、功耗、能耗及能效积(EDP)方面分别实现了高达2倍、2.24倍、1.48倍和2倍的性能提升。