CIM-MLC: A Multi-level Compilation Stack for Computing-In-Memory Accelerators

In recent years, various computing-in-memory (CIM) processors have been presented, showing superior performance over traditional architectures. To unleash the potential of various CIM architectures, such as device precision, crossbar size, and crossbar number, it is necessary to develop compilation tools that are fully aware of the CIM architectural details and implementation diversity. However, due to the lack of architectural support in current popular open-source compiling stacks, existing CIM designs either manually deploy networks or build their own compilers, which is time-consuming and labor-intensive. Although some works expose the specific CIM device programming interfaces to compilers, they are often bound to a fixed CIM architecture, lacking the flexibility to support the CIM architectures with different computing granularity. On the other hand, existing compilation works usually consider the scheduling of limited operation types (such as crossbar-bound matrix-vector multiplication). Unlike conventional processors, CIM accelerators are featured by their diverse architecture, circuit, and device, which cannot be simply abstracted by a single level if we seek to fully explore the advantages brought by CIM. Therefore, we propose CIM-MLC, a universal multi-level compilation framework for general CIM architectures. We first establish a general hardware abstraction for CIM architectures and computing modes to represent various CIM accelerators. Based on the proposed abstraction, CIM-MLC can compile tasks onto a wide range of CIM accelerators having different devices, architectures, and programming interfaces. More importantly, compared with existing compilation work, CIM-MLC can explore the mapping and scheduling strategies across multiple architectural tiers, which form a tractable yet effective design space, to achieve better scheduling and instruction generation results.

翻译：近年来，多种存内计算（CIM）处理器被提出，展现出优于传统架构的性能。为充分释放各类CIM架构（如器件精度、交叉阵列尺寸及交叉阵列数量）的潜力，亟需开发充分感知CIM架构细节与实现多样性的编译工具。然而，由于当前主流开源编译栈缺乏架构支持，现有CIM设计要么手动部署网络，要么构建自有编译器，耗时费力。尽管部分工作向编译器暴露了特定CIM器件编程接口，但这些方案通常绑定于固定CIM架构，缺乏支持不同计算粒度的CIM架构的灵活性。另一方面，现有编译工作通常仅考虑有限操作类型（如受限于交叉阵列的矩阵向量乘法）的调度。与传统处理器不同，CIM加速器以其架构、电路和器件的多样性为特征，若要充分探索CIM带来的优势，单一抽象层次显然不足。为此，我们提出CIM-MLC——一个面向通用CIM架构的通用多级编译框架。首先，我们为CIM架构和计算模式建立通用硬件抽象，以表示各类CIM加速器。基于所提出的抽象，CIM-MLC可将任务编译到具有不同器件、架构和编程接口的广泛CIM加速器上。更重要的是，与现有编译工作相比，CIM-MLC能够跨多个架构层级探索映射与调度策略，从而形成可行且高效的设计空间，实现更优的调度与指令生成结果。