PSAQ-ViT V2: Towards Accurate and General Data-Free Quantization for Vision Transformers

Data-free quantization can potentially address data privacy and security concerns in model compression, and thus has been widely investigated. Recently, PSAQ-ViT designs a relative value metric, patch similarity, to generate data from pre-trained vision transformers (ViTs), achieving the first attempt at data-free quantization for ViTs. In this paper, we propose PSAQ-ViT V2, a more accurate and general data-free quantization framework for ViTs, built on top of PSAQ-ViT. More specifically, following the patch similarity metric in PSAQ-ViT, we introduce an adaptive teacher-student strategy, which facilitates the constant cyclic evolution of the generated samples and the quantized model (student) in a competitive and interactive fashion under the supervision of the full-precision model (teacher), thus significantly improving the accuracy of the quantized model. Moreover, without the auxiliary category guidance, we employ the task- and model-independent prior information, making the general-purpose scheme compatible with a broad range of vision tasks and models. Extensive experiments are conducted on various models on image classification, object detection, and semantic segmentation tasks, and PSAQ-ViT V2, with the naive quantization strategy and without access to real-world data, consistently achieves competitive results, showing potential as a powerful baseline on data-free quantization for ViTs. For instance, with Swin-S as the (backbone) model, 8-bit quantization reaches 82.13 top-1 accuracy on ImageNet, 50.9 box AP and 44.1 mask AP on COCO, and 47.2 mIoU on ADE20K. We hope that accurate and general PSAQ-ViT V2 can serve as a potential and practice solution in real-world applications involving sensitive data. Code is released and merged at: https://github.com/zkkli/PSAQ-ViT.

翻译：无数据量化技术能有效应对模型压缩中的数据隐私与安全问题，因此受到广泛关注。近期，PSAQ-ViT设计了基于相对值度量的补丁相似性指标，通过从预训练视觉Transformer（ViT）生成数据，首次实现了ViT的无数据量化。本文在PSAQ-ViT基础上提出更精确通用的PSAQ-ViT V2无数据量化框架。具体而言，我们延续PSAQ-ViT的补丁相似性度量，引入自适应师生策略——在全精度模型（教师）监督下，通过竞争与交互方式促使生成样本与量化模型（学生）持续循环演化，从而显著提升量化模型精度。此外，无需辅助类别引导，我们采用任务与模型无关的先验信息，使通用方案兼容多种视觉任务与模型。针对图像分类、目标检测与语义分割任务的多种模型进行大量实验表明：PSAQ-ViT V2在采用朴素量化策略且无需真实数据访问条件下，始终取得竞争性结果，展现出成为ViT无数据量化强大基线的潜力。例如，以Swin-S为主干网络时，8-bit量化在ImageNet上达到82.13% top-1准确率，在COCO上实现50.9 box AP与44.1 mask AP，在ADE20K上达47.2 mIoU。我们期望精确通用的PSAQ-ViT V2能成为涉及敏感数据的实际应用中的潜在可行方案。代码已开源且合并至：https://github.com/zkkli/PSAQ-ViT。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/