AHCQ-SAM: Toward Accurate and Hardware-Compatible Post-Training Segment Anything Model Quantization

The Segment Anything Model (SAM) has revolutionized image and video segmentation with its powerful zero-shot capabilities. However, its massive parameter scale and high computational demands hinder efficient deployment on resource-constrained edge devices. While Post-Training Quantization (PTQ) offers a practical solution, existing methods still fail to handle four critical quantization challenges: (1) ill-conditioned weights; (2) skewed and long-tailed post-GELU activations; (3) pronounced inter-channel variance in linear projections; and (4) exponentially scaled and heterogeneous attention scores. To mitigate these bottlenecks, we propose AHCQ-SAM, an accurate and hardware-compatible PTQ framework featuring four synergistic components: (1) Activation-aware Condition Number Reduction (ACNR), which regularizes weight matrices via a proximal point algorithm to suppress ill-conditioning; (2) Hybrid Log-Uniform Quantization (HLUQ), which combines power-of-two and uniform quantizers to capture skewed post-GELU activations; (3) Channel-Aware Grouping (CAG), which clusters channels with homogeneous statistics to achieve high accuracy with minimal hardware overhead; and (4) Logarithmic Nonlinear Quantization (LNQ), which utilizes logarithmic transformations to adaptively adjust quantization resolution for exponential and heterogeneous attention scores. Experimental results demonstrate that AHCQ-SAM outperforms current methods on SAM. Compared with the SOTA method, it achieves a 15.2% improvement in mAP for 4-bit SAM-B with Faster R-CNN on the COCO dataset. Furthermore, we establish a PTQ benchmark for SAM2, where AHCQ-SAM yields a 14.01% improvement in J&F for 4-bit SAM2-Tiny on the SA-V Test dataset. Finally, FPGA-based implementation validates the practical utility of AHCQ-SAM, delivering a 7.12x speedup and a 6.62x power efficiency improvement over the floating-point baseline.

翻译：Segment Anything模型（SAM）凭借其强大的零样本能力，在图像和视频分割领域引发了革命性变革。然而，其庞大的参数量和高计算需求阻碍了其在资源受限的边缘设备上的高效部署。尽管训练后量化（PTQ）提供了一种实用解决方案，但现有方法仍无法应对四个关键量化挑战：（1）病态权重矩阵；（2）偏斜且长尾的后GELU激活值分布；（3）线性投影层显著的通道间方差；（4）指数级缩放且异质的注意力分数。为缓解这些瓶颈，我们提出AHCQ-SAM——一种精确且硬件兼容的PTQ框架，包含四个协同组件：（1）激活感知条件数缩减（ACNR），通过近端点算法正则化权重矩阵以抑制病态问题；（2）混合对数均匀量化（HLUQ），结合2的幂次量化器与均匀量化器以捕获偏斜的后GELU激活值；（3）通道感知分组（CAG），将统计特性同质的通道聚类，以极低硬件开销实现高精度；（4）对数非线性量化（LNQ），利用对数变换自适应调整量化分辨率以处理指数级且异质的注意力分数。实验结果表明，AHCQ-SAM在SAM模型上优于当前方法。与最先进方法相比，在COCO数据集上使用Faster R-CNN的4比特SAM-B模型，其mAP提升了15.2%。此外，我们建立了SAM2的PTQ基准测试，在SA-V测试数据集上，AHCQ-SAM使4比特SAM2-Tiny模型的J&F指标提升了14.01%。最后，基于FPGA的实现验证了AHCQ-SAM的实用价值，与浮点基线相比，实现了7.12倍的加速和6.62倍的能效提升。