Deep learning accelerators address the computational demands of Deep Neural Networks (DNNs), departing from the traditional Von Neumann execution model. They leverage specialized hardware to align with the application domain's structure. Compilers for these accelerators face distinct challenges compared to those for general-purpose processors. These challenges include exposing and managing more micro-architectural features, handling software-managed scratch pads for on-chip storage, explicitly managing data movement, and matching DNN layers with varying hardware capabilities. These complexities necessitate a new approach to compiler design, as traditional compilers mainly focused on generating fine-grained instruction sequences while abstracting micro-architecture details. This paper introduces the Architecture Covenant Graph (ACG), an abstract representation of an architectural structure's components and their programmable capabilities. By enabling the compiler to work with the ACG, it allows for adaptable compilation workflows when making changes to accelerator design, reducing the need for a complete compiler redevelopment. Codelets, which express DNN operation functionality and evolve into execution mappings on the ACG, are key to this process. The Covenant compiler efficiently targets diverse deep learning accelerators, achieving 93.8% performance compared to state-of-the-art, hand-tuned DNN layer implementations when compiling 14 DNN layers from various models on two different architectures.
翻译:深度学习加速器为了满足深度神经网络(DNN)的计算需求,偏离了传统的冯·诺依曼执行模型。它们利用专用硬件来适应应用领域结构。与通用处理器上的编译器相比,这些加速器的编译器面临独特挑战,包括暴露和管理更多微架构特征、处理片上存储的软件管理暂存器、显式管理数据移动,以及将DNN层与不同硬件能力匹配。这些复杂性要求编译器设计采用新方法,因为传统编译器主要专注于生成细粒度指令序列,同时抽象化微架构细节。本文提出架构契约图(ACG),这是一种对架构结构组件及其可编程能力的抽象表示。通过使编译器与ACG协作,可在加速器设计变更时支持可适应编译工作流,减少完全重新开发编译器的需求。表达DNN运算功能并在ACG上演变为执行映射的码元(Codelets)是该过程的关键。Covenant编译器高效地针对多种深度学习加速器,在两种不同架构上编译来自多个模型的14个DNN层时,实现了与最先进的手工调优DNN层实现相比93.8%的性能。