Data-free quantization can potentially address data privacy and security concerns in model compression, and thus has been widely investigated. Recently, PSAQ-ViT designs a relative value metric, patch similarity, to generate data from pre-trained vision transformers (ViTs), achieving the first attempt at data-free quantization for ViTs. In this paper, we propose PSAQ-ViT V2, a more accurate and general data-free quantization framework for ViTs, built on top of PSAQ-ViT. More specifically, following the patch similarity metric in PSAQ-ViT, we introduce an adaptive teacher-student strategy, which facilitates the constant cyclic evolution of the generated samples and the quantized model (student) in a competitive and interactive fashion under the supervision of the full-precision model (teacher), thus significantly improving the accuracy of the quantized model. Moreover, without the auxiliary category guidance, we employ the task- and model-independent prior information, making the general-purpose scheme compatible with a broad range of vision tasks and models. Extensive experiments are conducted on various models on image classification, object detection, and semantic segmentation tasks, and PSAQ-ViT V2, with the naive quantization strategy and without access to real-world data, consistently achieves competitive results, showing potential as a powerful baseline on data-free quantization for ViTs. For instance, with Swin-S as the (backbone) model, 8-bit quantization reaches 82.13 top-1 accuracy on ImageNet, 50.9 box AP and 44.1 mask AP on COCO, and 47.2 mIoU on ADE20K. We hope that accurate and general PSAQ-ViT V2 can serve as a potential and practice solution in real-world applications involving sensitive data. Code is released and merged at: https://github.com/zkkli/PSAQ-ViT.
翻译:无数据量化技术能有效应对模型压缩中的数据隐私与安全问题,因此受到广泛关注。近期,PSAQ-ViT设计了基于相对值度量的补丁相似性指标,通过从预训练视觉Transformer(ViT)生成数据,首次实现了ViT的无数据量化。本文在PSAQ-ViT基础上提出更精确通用的PSAQ-ViT V2无数据量化框架。具体而言,我们延续PSAQ-ViT的补丁相似性度量,引入自适应师生策略——在全精度模型(教师)监督下,通过竞争与交互方式促使生成样本与量化模型(学生)持续循环演化,从而显著提升量化模型精度。此外,无需辅助类别引导,我们采用任务与模型无关的先验信息,使通用方案兼容多种视觉任务与模型。针对图像分类、目标检测与语义分割任务的多种模型进行大量实验表明:PSAQ-ViT V2在采用朴素量化策略且无需真实数据访问条件下,始终取得竞争性结果,展现出成为ViT无数据量化强大基线的潜力。例如,以Swin-S为主干网络时,8-bit量化在ImageNet上达到82.13% top-1准确率,在COCO上实现50.9 box AP与44.1 mask AP,在ADE20K上达47.2 mIoU。我们期望精确通用的PSAQ-ViT V2能成为涉及敏感数据的实际应用中的潜在可行方案。代码已开源且合并至:https://github.com/zkkli/PSAQ-ViT。