This paper addresses the need for automatic and efficient generation of host driver code for arbitrary custom AXI-based accelerators targeting linear algebra algorithms, an important workload in various applications, including machine learning and scientific computing. While existing tools have focused on automating accelerator prototyping, little attention has been paid to the host-accelerator interaction. This paper introduces AXI4MLIR, an extension of the MLIR compiler framework designed to facilitate the automated generation of host-accelerator driver code. With new MLIR attributes and transformations, AXI4MLIR empowers users to specify accelerator features (including their instructions) and communication patterns and exploit the host memory hierarchy. We demonstrate AXI4MLIR's versatility across different types of accelerators and problems, showcasing significant CPU cache reference reductions (up to 56%) and up to a 1.65x speedup compared to manually optimized driver code implementations. AXI4MLIR implementation is open-source and available at: https://github.com/AXI4MLIR/axi4mlir.
翻译:本文针对线性代数算法领域任意自定义AXI加速器的主机驱动代码自动高效生成需求展开研究。线性代数是机器学习与科学计算等应用场景中的核心工作负载。现有工具多聚焦于加速器原型自动化设计,却鲜有关注主机-加速器交互问题。为此,本文提出AXI4MLIR——一种基于MLIR编译器框架的扩展方案,旨在实现主机-加速器驱动代码的自动生成。通过引入新型MLIR属性与变换机制,AXI4MLIR允许用户自主定义加速器特性(包含指令集)、通信模式,并充分利用主机内存层级体系。我们在多种加速器类型与不同规模问题上验证了AXI4MLIR的通用性:相较于人工优化的驱动代码实现,该方案可实现高达56%的CPU缓存引用缩减以及1.65倍加速比。AXI4MLIR实现已开源,代码仓库地址:https://github.com/AXI4MLIR/axi4mlir