Recently, crossbar array based in-memory accelerators have been gaining interest due to their high throughput and energy efficiency. While software and compiler support for the in-memory accelerators has also been introduced, they are currently limited to the case where all weights are assumed to be on-chip. This limitation becomes apparent with the significantly increasing network sizes compared to the in-memory footprint. Weight replacement schemes are essential to address this issue. We propose COMPASS, a compiler framework for resource-constrained crossbar-based processing-in-memory (PIM) deep neural network (DNN) accelerators. COMPASS is specially targeted for networks that exceed the capacity of PIM crossbar arrays, necessitating access to external memories. We propose an algorithm to determine the optimal partitioning that divides the layers so that each partition can be accelerated on chip. Our scheme takes into account the data dependence between layers, core utilization, and the number of write instructions to minimize latency, memory accesses, and improve energy efficiency. Simulation results demonstrate that COMPASS can accommodate much more networks using a minimal memory footprint, while improving throughput by 1.78X and providing 1.28X savings in energy-delay product (EDP) over baseline partitioning methods.
翻译:近年来,基于交叉阵列的内存内加速器因其高吞吐量和能效优势而备受关注。尽管针对此类内存内加速器的软件与编译器支持也已相继推出,但它们目前仅限于假设所有权重均驻留在芯片上的场景。随着网络规模相较于内存容量显著增大,这一局限性日益凸显。权重替换方案对于解决此问题至关重要。本文提出COMPASS,一个面向资源受限的、基于交叉阵列的内存内处理(PIM)深度神经网络(DNN)加速器的编译器框架。COMPASS专门针对规模超出PIM交叉阵列容量、需要访问外部存储器的网络而设计。我们提出一种算法来确定最优划分方案,该方案对网络层进行划分,使得每个划分部分均能在芯片上获得加速。我们的方案综合考虑了层间数据依赖性、核心利用率以及写指令数量,以最小化延迟、减少内存访问并提升能效。仿真结果表明,COMPASS能够以最小的内存占用支持更多网络,与基线划分方法相比,吞吐量提升1.78倍,能量延迟积(EDP)节省1.28倍。