We propose Quamba-SE, a soft-edge quantizer for State Space Model (SSM) activation quantization. Unlike existing methods, using standard INT8 operation, Quamba-SE employs three adaptive scales: high-precision for small values, standard scale for normal values, and low-precision for outliers. This preserves outlier information instead of hard clipping, while maintaining precision for other values. We evaluate on Mamba- 130M across 6 zero-shot benchmarks. Results show that Quamba- SE consistently outperforms Quamba, achieving up to +2.68% on individual benchmarks and up to +0.83% improvement in the average accuracy of 6 datasets.
翻译:我们提出Quamba-SE,一种用于状态空间模型(SSM)激活量化的软边界量化器。与现有采用标准INT8运算的方法不同,Quamba-SE采用三种自适应尺度:对小数值使用高精度尺度,对正常数值使用标准尺度,对异常值使用低精度尺度。该方法在保持其他数值精度的同时,保留了异常值信息而非进行硬截断。我们在Mamba-130M模型上对6个零样本基准测试进行了评估。结果表明,Quamba-SE在各项基准测试中均持续优于Quamba,在单个基准测试上最高提升达+2.68%,在6个数据集的平均准确率上最高提升达+0.83%。