Compute-in-Memory (CIM) architectures have been widely studied for deep neural network (DNN) acceleration by reducing data transfer overhead between the memory and computing units. In conventional CIM design flows, system-level CIM simulators (such as NeuroSim) are leveraged for design space exploration (DSE) across different hardware configurations and DNN workloads. However, CIM designers need to invest substantial effort in interpreting simulator manuals and understanding complex parameter dependencies. Moreover, extensive design-simulation iterations are often required to identify optimal CIM configurations under hardware constraints. These challenges severely prolong the DSE cycle and hinder rapid CIM deployment. To address these challenges, this work proposes ChatNeuroSim, a large language model (LLM)-based agent framework for automated CIM accelerator deployment and optimization. ChatNeuroSim automates the entire CIM workflow, including task scheduling, request parsing and adjustment, parameter dependency checking, script generation, and simulation execution. It also integrates the proposed CIM optimizer using design space pruning, enabling rapid identification of optimal configurations for different DNN workloads. ChatNeuroSim is evaluated on extensive request-level testbenches and demonstrates correct simulation and optimization behavior, validating its effectiveness in automatic request parsing and task execution. Furthermore, the proposed design space pruning technique accelerates CIM optimization process compared to no-pruning baseline. In the case study optimizing Swin Transformer Tiny under 22 nm technology, the proposed CIM optimizer achieves a 0.42$\times$-0.79$\times$ average runtime reduction compared to the same optimization algorithm without design space pruning.
翻译:存内计算(CIM)架构通过减少内存与计算单元间的数据传输开销,已被广泛研究用于深度神经网络(DNN)加速。在传统的CIM设计流程中,系统级CIM模拟器(如NeuroSim)被用于跨不同硬件配置和DNN工作负载的设计空间探索(DSE)。然而,CIM设计者需要投入大量精力解读模拟器手册并理解复杂的参数依赖关系。此外,通常需要大量的设计-模拟迭代才能在硬件约束下确定最优的CIM配置。这些挑战严重延长了DSE周期,阻碍了CIM的快速部署。为应对这些挑战,本研究提出了ChatNeuroSim,一个基于大语言模型(LLM)的代理框架,用于实现CIM加速器的自动部署与优化。ChatNeuroSim自动化了整个CIM工作流程,包括任务调度、请求解析与调整、参数依赖检查、脚本生成以及模拟执行。它还集成了所提出的基于设计空间剪枝的CIM优化器,能够快速为不同的DNN工作负载确定最优配置。ChatNeuroSim在广泛的请求级测试平台上进行了评估,并展示了正确的模拟与优化行为,验证了其在自动请求解析和任务执行方面的有效性。此外,与无剪枝基线相比,所提出的设计空间剪枝技术加速了CIM优化过程。在22纳米工艺下优化Swin Transformer Tiny的案例研究中,与未采用设计空间剪枝的相同优化算法相比,所提出的CIM优化器实现了0.42$\times$-0.79$\times$的平均运行时间减少。