Several companies often safeguard their trained deep models (i.e., details of architecture, learnt weights, training details etc.) from third-party users by exposing them only as black boxes through APIs. Moreover, they may not even provide access to the training data due to proprietary reasons or sensitivity concerns. In this work, we propose a novel defense mechanism for black box models against adversarial attacks in a data-free set up. We construct synthetic data via generative model and train surrogate network using model stealing techniques. To minimize adversarial contamination on perturbed samples, we propose 'wavelet noise remover' (WNR) that performs discrete wavelet decomposition on input images and carefully select only a few important coefficients determined by our 'wavelet coefficient selection module' (WCSM). To recover the high-frequency content of the image after noise removal via WNR, we further train a 'regenerator' network with an objective to retrieve the coefficients such that the reconstructed image yields similar to original predictions on the surrogate model. At test time, WNR combined with trained regenerator network is prepended to the black box network, resulting in a high boost in adversarial accuracy. Our method improves the adversarial accuracy on CIFAR-10 by 38.98% and 32.01% on state-of-the-art Auto Attack compared to baseline, even when the attacker uses surrogate architecture (Alexnet-half and Alexnet) similar to the black box architecture (Alexnet) with same model stealing strategy as defender. The code is available at https://github.com/vcl-iisc/data-free-black-box-defense
翻译:多家公司通常通过API将训练好的深度模型(如架构细节、学习参数、训练细节等)仅作为黑盒暴露给第三方用户,从而保护其模型安全。此外,出于专利原因或敏感性考虑,它们甚至可能不提供训练数据的访问权限。本文提出了一种新颖的黑盒模型防御机制,可在无数据环境下抵御对抗攻击。我们通过生成模型构建合成数据,并利用模型窃取技术训练替代网络。为最小化扰动样本的对抗污染,我们提出"小波噪声去除器"(WNR),该模块对输入图像进行离散小波分解,并仅选取由"小波系数选择模块"(WCSM)确定的少量重要系数。为恢复经WNR去噪后的图像高频成分,我们进一步训练"再生器"网络,其目标是通过检索系数使重建图像在替代模型上产生与原始预测相似的输出。在测试阶段,将WNR与训练好的再生器网络串联至黑盒网络前端,可显著提升对抗鲁棒性。在CIFAR-10数据集上,相比基准方法,即使攻击者采用与黑盒架构(Alexnet)相似的替代架构(Alexnet-half和Alexnet)并采用与防御者相同的模型窃取策略,我们的方法在最新Auto Attack攻击下的对抗准确率仍分别提升38.98%和32.01%。代码开源地址:https://github.com/vcl-iisc/data-free-black-box-defense