Data-free quantization (DFQ) enables model quantization without accessing real data, addressing concerns regarding data security and privacy. With the growing adoption of Vision Transformers (ViTs), DFQ for ViTs has garnered significant attention. However, existing DFQ methods exhibit two limitations: (1) semantic distortion, where the semantics of synthetic images deviate substantially from those of real images, and (2) semantic inadequacy, where synthetic images contain extensive regions with limited content and oversimplified textures, leading to suboptimal quantization performance. To address these limitations, we propose SARDFQ, a novel Semantics Alignment and Reinforcement Data-Free Quantization method for ViTs. To address semantic distortion, SARDFQ incorporates Attention Priors Alignment (APA), which optimizes synthetic images to follow randomly generated structure attention priors. To mitigate semantic inadequacy, SARDFQ introduces Multi-Semantic Reinforcement (MSR), leveraging localized patch optimization to enhance semantic richness across synthetic images. Furthermore, SARDFQ employs Soft-Label Learning (SL), wherein multiple semantic targets are adapted to facilitate the learning of multi-semantic images augmented by MSR. Extensive experiments demonstrate the effectiveness of SARDFQ, significantly surpassing existing methods. For example, SARDFQ improves top-1 accuracy on ImageNet by 15.52% for W4A4 ViT-B. The code is at https://github.com/zysxmu/SARDFQ.
翻译:无数据量化(DFQ)无需访问真实数据即可实现模型量化,有效应对数据安全与隐私问题。随着视觉Transformer(ViT)的广泛应用,针对ViT的无数据量化方法受到广泛关注。然而,现有DFQ方法存在两大局限:(1)语义失真,即合成图像的语义与真实图像存在显著偏差;(2)语义不足,即合成图像包含大量内容贫乏、纹理过度简化的区域,导致量化性能欠佳。为解决这些问题,本文提出SARDFQ——一种面向ViT的新型语义对齐与增强无数据量化方法。针对语义失真问题,SARDFQ引入注意力先验对齐(APA)机制,通过随机生成的结构注意力先验优化合成图像。为缓解语义不足,SARDFQ提出多语义增强(MSR)策略,利用局部块优化提升合成图像的语义丰富度。此外,SARDFQ采用软标签学习(SL)方法,通过适配多语义目标促进经MSR增强的多语义图像学习。大量实验验证了SARDFQ的有效性,其性能显著超越现有方法。例如,在ImageNet数据集上,SARDFQ将W4A4 ViT-B模型的top-1准确率提升了15.52%。代码开源地址:https://github.com/zysxmu/SARDFQ。