Several companies often safeguard their trained deep models (i.e., details of architecture, learnt weights, training details etc.) from third-party users by exposing them only as black boxes through APIs. Moreover, they may not even provide access to the training data due to proprietary reasons or sensitivity concerns. In this work, we propose a novel defense mechanism for black box models against adversarial attacks in a data-free set up. We construct synthetic data via generative model and train surrogate network using model stealing techniques. To minimize adversarial contamination on perturbed samples, we propose 'wavelet noise remover' (WNR) that performs discrete wavelet decomposition on input images and carefully select only a few important coefficients determined by our 'wavelet coefficient selection module' (WCSM). To recover the high-frequency content of the image after noise removal via WNR, we further train a 'regenerator' network with the objective of retrieving the coefficients such that the reconstructed image yields similar to original predictions on the surrogate model. At test time, WNR combined with trained regenerator network is prepended to the black box network, resulting in a high boost in adversarial accuracy. Our method improves the adversarial accuracy on CIFAR-10 by 38.98% and 32.01% on state-of-the-art Auto Attack compared to baseline, even when the attacker uses surrogate architecture (Alexnet-half and Alexnet) similar to the black box architecture (Alexnet) with same model stealing strategy as defender. The code is available at https://github.com/vcl-iisc/data-free-black-box-defense
翻译:多家公司通常仅通过API以黑箱形式暴露其训练好的深度模型(如架构细节、学习权重、训练细节等),从而防止第三方用户获取这些信息。此外,由于专利原因或敏感性考虑,他们甚至可能不提供训练数据的访问权限。在本工作中,我们提出了一种新颖的黑箱模型对抗攻击防御机制,该机制在无数据条件下运行。我们通过生成模型构建合成数据,并利用模型窃取技术训练替代网络。为最小化扰动样本上的对抗性污染,我们提出"小波噪声移除器"(WNR),其对输入图像进行离散小波分解,并仅精心选择由"小波系数选择模块"(WCSM)确定的少量关键系数。为恢复经WNR去噪后的图像高频内容,我们进一步训练"再生器"网络,其目标是重获系数,使得重建图像在替代模型上产生与原始预测相似的输出。在测试阶段,将WNR与训练好的再生器网络串联至黑箱网络之前,显著提升了对抗鲁棒性。我们的方法在CIFAR-10数据集上,相比基线方法,针对最先进的Auto Attack攻击的对抗准确率提升了38.98%和32.01%,即便攻击者使用与黑箱架构(Alexnet)相似的替代架构(Alexnet-half和Alexnet),并采用与防御者相同的模型窃取策略。代码开源地址:https://github.com/vcl-iisc/data-free-black-box-defense