Software-hardware co-design is essential for optimizing in-memory computing (IMC) hardware accelerators for neural networks. However, most existing optimization frameworks target a single workload, leading to highly specialized hardware designs that do not generalize well across models and applications. In contrast, practical deployment scenarios require a single IMC platform that can efficiently support multiple neural network workloads. This work presents a joint hardware-workload co-optimization framework based on an optimized evolutionary algorithm for designing generalized IMC accelerator architectures. By explicitly capturing cross-workload trade-offs rather than optimizing for a single model, the proposed approach significantly reduces the performance gap between workload-specific and generalized IMC designs. The framework is evaluated on both RRAM- and SRAM-based IMC architectures, demonstrating strong robustness and adaptability across diverse design scenarios. Compared to baseline methods, the optimized designs achieve energy-delay-area product (EDAP) reductions of up to 76.2% and 95.5% when optimizing across a small set (4 workloads) and a large set (9 workloads), respectively. The source code of the framework is available at https://github.com/OlgaKrestinskaya/JointHardwareWorkloadOptimizationIMC.
翻译:软件-硬件协同设计对于优化面向神经网络的存内计算硬件加速器至关重要。然而,现有的大多数优化框架仅针对单一负载,导致硬件设计高度特化,难以在不同模型和应用间良好泛化。相比之下,实际部署场景要求单个存内计算平台能够高效支持多种神经网络负载。本研究提出了一种基于优化进化算法的硬件-负载联合协同优化框架,用于设计通用的存内计算加速器架构。该方法通过显式捕获跨负载的权衡取舍,而非针对单一模型进行优化,显著缩小了专用负载设计与通用存内计算设计之间的性能差距。该框架在基于RRAM和SRAM的存内计算架构上均进行了评估,结果表明其在多种设计场景下均表现出强大的鲁棒性和适应性。与基线方法相比,当分别在小规模负载集(4种负载)和大规模负载集(9种负载)上进行优化时,优化后的设计在能量-延迟-面积乘积指标上分别实现了高达76.2%和95.5%的降低。该框架的源代码可在 https://github.com/OlgaKrestinskaya/JointHardwareWorkloadOptimizationIMC 获取。