Computing-in-Memory (CIM) has shown great potential for enhancing efficiency and performance for deep neural networks (DNNs). However, the lack of flexibility in CIM leads to an unnecessary expenditure of computational resources on less critical operations, and a diminished Signal-to-Noise Ratio (SNR) when handling more complex tasks, significantly hindering the overall performance. Hence, we focus on the integration of CIM with Saliency-Aware Computing -- a paradigm that dynamically tailors computing precision based on the importance of each input. We propose On-the-fly Saliency-Aware Hybrid CIM (OSA-HCIM) offering three primary contributions: (1) On-the-fly Saliency-Aware (OSA) precision configuration scheme, which dynamically sets the precision of each MAC operation based on its saliency, (2) Hybrid CIM Array (HCIMA), which enables simultaneous operation of digital-domain CIM (DCIM) and analog-domain CIM (ACIM) via split-port 6T SRAM, and (3) an integrated framework combining OSA and HCIMA to fulfill diverse accuracy and power demands. Implemented on a 65nm CMOS process, OSA-HCIM demonstrates an exceptional balance between accuracy and resource utilization. Notably, it is the first CIM design to incorporate a dynamic digital-to-analog boundary, providing unprecedented flexibility for saliency-aware computing. OSA-HCIM achieves a 1.95x enhancement in energy efficiency, while maintaining minimal accuracy loss compared to DCIM when tested on CIFAR100 dataset.
翻译:摘要:存内计算(CIM)在提升深度神经网络(DNN)效率与性能方面展现出巨大潜力。然而,CIM缺乏灵活性导致计算资源在非关键操作上的不必要消耗,同时处理复杂任务时信噪比(SNR)下降,严重制约整体性能。为此,我们聚焦CIM与基于重要性感知的计算范式——即动态根据输入重要程度调整计算精度的机制——的融合。本文提出"即时显著性感知混合CIM"(OSA-HCIM),包含三项核心创新:(1) 即时显著性感知(OSA)精度配置方案,基于每个MAC操作的显著性动态设定计算精度;(2) 混合CIM阵列(HCIMA),采用分裂端口6T SRAM实现数字域CIM(DCIM)与模拟域CIM(ACIM)的并行运算;(3) 集成OSA与HCIMA的统一框架,满足多样化的精度与功耗需求。基于65nm CMOS工艺实现的OSA-HCIM在精度与资源利用率间实现了卓越平衡。值得注意的是,该设计首次在CIM架构中引入动态数模边界分界机制,为显著性感知计算提供了前所未有的灵活性。在CIFAR100数据集上的测试表明,与DCIM相比,OSA-HCIM在保持极小精度损失的前提下实现了1.95倍的能效提升。