We propose Multiplier-less INTeger (MINT) quantization, a uniform quantization scheme that efficiently compresses weights and membrane potentials in spiking neural networks (SNNs). Unlike previous SNN quantization methods, MINT quantizes memory-intensive membrane potentials to an extremely low precision (2-bit), significantly reducing the memory footprint. MINT also shares the quantization scaling factor between weights and membrane potentials, eliminating the need for multipliers required in conventional uniform quantization. Experimental results show that our method matches the accuracy of full-precision models and other state-of-the-art SNN quantization techniques while surpassing them in memory footprint reduction and hardware cost efficiency at deployment. For example, 2-bit MINT VGG-16 achieves 90.6% accuracy on CIFAR-10, with roughly 93.8% reduction in memory footprint from the full-precision model and 90% reduction in computation energy compared to vanilla uniform quantization at deployment. The code is available at https://github.com/Intelligent-Computing-Lab-Yale/MINT-Quantization.
翻译:我们提出了一种无乘法器整数(MINT)量化方案,这是一类能够高效压缩脉冲神经网络(SNN)中权重和膜电位的均匀量化方法。与以往SNN量化方法不同,MINT将存储密集型的膜电位参数量化至极低精度(2比特),显著降低内存占用。该方法还共享权重与膜电位的量化缩放因子,从而消除了传统均匀量化中所需乘法器的使用。实验结果表明,本方法在保持与全精度模型及其他先进SNN量化技术同等精度的同时,在部署时的内存占用降低与硬件成本效率方面均取得更优表现。例如,2比特MINT VGG-16在CIFAR-10数据集上达到90.6%的准确率,相较全精度模型减少了约93.8%的内存占用,且部署时的计算能耗相比传统均匀量化降低了90%。相关代码已开源至https://github.com/Intelligent-Computing-Lab-Yale/MINT-Quantization。