Mitigating Adversarial Attacks by Distributing Different Copies to Different Users

Machine learning models are vulnerable to adversarial attacks. In this paper, we consider the scenario where a model is distributed to multiple buyers, among which a malicious buyer attempts to attack another buyer. The malicious buyer probes its copy of the model to search for adversarial samples and then presents the found samples to the victim's copy of the model in order to replicate the attack. We point out that by distributing different copies of the model to different buyers, we can mitigate the attack such that adversarial samples found on one copy would not work on another copy. We observed that training a model with different randomness indeed mitigates such replication to a certain degree. However, there is no guarantee and retraining is computationally expensive. A number of works extended the retraining method to enhance the differences among models. However, a very limited number of models can be produced using such methods and the computational cost becomes even higher. Therefore, we propose a flexible parameter rewriting method that directly modifies the model's parameters. This method does not require additional training and is able to generate a large number of copies in a more controllable manner, where each copy induces different adversarial regions. Experimentation studies show that rewriting can significantly mitigate the attacks while retaining high classification accuracy. For instance, on GTSRB dataset with respect to Hop Skip Jump attack, using attractor-based rewriter can reduce the success rate of replicating the attack to 0.5% while independently training copies with different randomness can reduce the success rate to 6.5%. From this study, we believe that there are many further directions worth exploring.

翻译：机器学习模型易受对抗攻击。本文考虑了一个场景：模型被分发给多个买家，其中恶意买家试图攻击另一位买家。恶意买家探测其拥有的模型副本以搜索对抗样本，然后将发现的样本呈现给受害者的模型副本，以复制攻击。我们指出，通过向不同买家分发不同的模型副本，可以缓解此类攻击，使得在一个副本上发现的对抗样本无法作用于另一个副本。我们观察到，使用不同的随机性训练模型确实能在一定程度上缓解这种复制。然而，这种方法缺乏保证且重新训练的计算成本高昂。已有若干研究扩展了重新训练方法以增强模型间的差异，但此类方法能生成的模型数量非常有限，且计算成本更高。因此，我们提出了一种灵活的参数重写方法，该方法直接修改模型参数。这种方法无需额外训练，能够以更可控的方式生成大量副本，每个副本诱导不同的对抗区域。实验研究表明，重写方法能显著缓解攻击，同时保持较高的分类准确率。例如，在GTSRB数据集上针对Hop Skip Jump攻击，使用基于吸引子的重写器可将复制攻击的成功率降至0.5%，而使用不同随机性独立训练的副本仅能将成功率降至6.5%。通过本研究，我们认为存在许多值得进一步探索的方向。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

专知会员服务

16+阅读 · 2022年3月13日

【超赞的#C++#速查&信息图】“hacking c++ - Cheat Sheets & Infographics”

专知会员服务

30+阅读 · 2022年3月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日