As a pivotal technique for improving the defense of deep models, adversarial robustness transfer via distillation has demonstrated remarkable success in conventional image classification tasks. However, this paradigm encounters critical challenges when applied to vision-language models (VLM) (e.g., CLIP): constructing adversarially robust teacher for large-scale multi-modal models demands prohibitively high computational resources. We bridge this gap by revealing an interesting phenomenon: vanilla CLIP (without adversarial training) exhibits intrinsic defensive capabilities against adversarial examples generated by another CLIP with different architectures. We formally define this as proxy adversarial robustness, and naturally propose a Heterogeneous Proxy Transfer (HPT) framework that establishes cross-architectural robustness distillation channels between CLIP variants, effortlessly enabling the VLM robustness transfer from proxy to target models. Yet, such proxy transfer paradigm easily induces severe overfitting, leading to a sharp degradation in zero-shot natural generalization. To resolve that, we design Generalization-Pivot Decoupling (GPD) by leveraging the difference in learning rate scheduling. This decouples the proxy transfer process into a generalization-anchored warm-up that maintains generalization and a generalization-pulled HPT that promotes adversarial robustness, to achieve an equilibrium between natural generalization and adversarial robustness. Extensive experiments on 15 zero-shot datasets demonstrate the effectiveness of our HPT-GPD method. The code is available at the website of github.com/fxw13/HPT-GPD.
翻译:作为提升深度模型防御能力的关键技术,通过蒸馏实现的对抗鲁棒性迁移在传统图像分类任务中已展现出显著成效。然而,该范式应用于视觉语言模型(如CLIP)时面临关键挑战:为大规模多模态模型构建对抗鲁棒的教师模型需要极高的计算资源。我们通过揭示一个有趣现象弥合了这一差距:原始CLIP(未经对抗训练)对由不同架构的另一CLIP生成的对抗样本展现出内在防御能力。我们将其形式化定义为代理对抗鲁棒性,并自然提出异构代理迁移框架,该框架在CLIP变体间建立跨架构鲁棒性蒸馏通道,轻松实现从代理模型到目标模型的视觉语言模型鲁棒性迁移。然而,此类代理迁移范式极易引发严重过拟合,导致零样本自然泛化能力急剧下降。为解决此问题,我们利用学习率调度差异设计了泛化枢轴解耦机制,将代理迁移过程解耦为保持泛化能力的泛化锚定预热阶段和提升对抗鲁棒性的泛化牵引异构代理迁移阶段,从而在自然泛化与对抗鲁棒性间达成平衡。在15个零样本数据集上的大量实验证明了我们HPT-GPD方法的有效性。代码发布于github.com/fxw13/HPT-GPD网站。