Column-wise Quantization of Weights and Partial Sums for Accurate and Efficient Compute-In-Memory Accelerators

Compute-in-memory (CIM) is an efficient method for implementing deep neural networks (DNNs) but suffers from substantial overhead from analog-to-digital converters (ADCs), especially as ADC precision increases. Low-precision ADCs can reduce this overhead but introduce partial-sum quantization errors degrading accuracy. Additionally, low-bit weight constraints, imposed by cell limitations and the need for multiple cells for higher-bit weights, present further challenges. While fine-grained partial-sum quantization has been studied to lower ADC resolution effectively, weight granularity, which limits overall partial-sum quantized accuracy, remains underexplored. This work addresses these challenges by aligning weight and partial-sum quantization granularities at the column-wise level. Our method improves accuracy while maintaining dequantization overhead, simplifies training by removing two-stage processes, and ensures robustness to memory cell variations via independent column-wise scale factors. We also propose an open-source CIM-oriented convolution framework to handle fine-grained weights and partial-sums efficiently, incorporating a novel tiling method and group convolution. Experimental results on ResNet-20 (CIFAR-10, CIFAR-100) and ResNet-18 (ImageNet) show accuracy improvements of 0.99%, 2.69%, and 1.01%, respectively, compared to the best-performing related works. Additionally, variation analysis reveals the robustness of our method against memory cell variations. These findings highlight the effectiveness of our quantization scheme in enhancing accuracy and robustness while maintaining hardware efficiency in CIM-based DNN implementations. Our code is available at https://github.com/jiyoonkm/ColumnQuant.

翻译：存内计算（CIM）是实现深度神经网络（DNN）的高效方法，但其性能受模数转换器（ADC）的显著开销影响，且ADC精度越高开销越大。低精度ADC虽能降低开销，却会引入部分和量化误差，导致精度下降。此外，受存储单元限制及高比特权重需多单元存储的约束，低比特权重也带来进一步挑战。尽管已有研究通过细粒度部分和量化来有效降低ADC分辨率，但制约整体部分和量化精度的权重粒度问题仍未得到充分探索。本研究通过在列级别对齐权重与部分和的量化粒度来解决这些挑战。我们的方法在保持反量化开销的同时提高了精度，通过消除两阶段训练流程简化了训练过程，并借助独立的列级缩放因子确保了对存储单元变化的鲁棒性。我们还提出了一个开源的面向CIM的卷积框架，以高效处理细粒度权重与部分和，其中包含新颖的平铺方法和分组卷积。在ResNet-20（CIFAR-10、CIFAR-100）和ResNet-18（ImageNet）上的实验结果显示，相较于性能最佳的相关工作，精度分别提升了0.99%、2.69%和1.01%。此外，变异分析表明我们的方法对存储单元变化具有鲁棒性。这些发现凸显了我们的量化方案在基于CIM的DNN实现中，能够在保持硬件效率的同时有效提升精度与鲁棒性。我们的代码发布于 https://github.com/jiyoonkm/ColumnQuant。