SRAM-based compute-in-memory (CIM) offers high computational density and energy efficiency for deep neural network (DNN) accelerators, but its limited capacity causes on/off-chip data movement overhead for large DNN models. Existing CIM accelerator studies typically assume that DNN models fit entirely on-chip, leaving efficient dataflow design largely untapped. This paper introduces AccelCIM, a systematic dataflow exploration framework for SRAM CIM accelerator, which addresses two key limitations of prior work. (1) It formulates a systematic dataflow design space spanning CIM macro configurations and macro-array organizations. (2) It introduces rigorous design evaluation using cycle-accurate architectural simulation and post-layout PPA analysis. We conduct an extensive design space exploration and apply AccelCIM to representative LLM applications, providing practical insights for the principled design of CIM accelerators.
翻译:基于SRAM的存算一体架构为深度神经网络加速器提供了高计算密度和能量效率,但其有限容量导致大型神经网络模型存在片内/片外数据搬移开销。现有存算一体加速器研究通常假设神经网络模型完全适配在片内,导致高效数据流设计未被充分挖掘。本文提出AccelCIM——面向SRAM存算一体加速器的系统性数据流探索框架,解决了先前工作的两个关键局限:(1)构建了一个涵盖CIM宏单元配置与宏阵列组织的系统性数据流设计空间;(2)引入基于周期精确架构仿真与版图后PPA分析的严谨设计评估方法。我们开展了广泛的设计空间探索,并将AccelCIM应用于代表性大语言模型应用场景,为CIM加速器的规范化设计提供了实践指导。