In the seller-buyer setting on machine learning models, the seller generates different copies based on the original model and distributes them to different buyers, such that adversarial samples generated on one buyer's copy would likely not work on other copies. A known approach achieves this using attractor-based rewriter which injects different attractors to different copies. This induces different adversarial regions in different copies, making adversarial samples generated on one copy not replicable on others. In this paper, we focus on a scenario where multiple malicious buyers collude to attack. We first give two formulations and conduct empirical studies to analyze effectiveness of collusion attack under different assumptions on the attacker's capabilities and properties of the attractors. We observe that existing attractor-based methods do not effectively mislead the colluders in the sense that adversarial samples found are influenced more by the original model instead of the attractors as number of colluders increases. Based on this observation, we propose using adaptive attractors whose weight is guided by a U-shape curve to cover the shortfalls. Experimentation results show that when using our approach, the attack success rate of a collusion attack converges to around 15% even when lots of copies are applied for collusion. In contrast, when using the existing attractor-based rewriter with fixed weight, the attack success rate increases linearly with the number of copies used for collusion.
翻译:在机器学习模型的买卖双方模式下,卖方基于原始模型生成不同副本并分发给不同买家,使得针对某一买家副本生成的对抗样本大概率无法作用于其他副本。目前已知的一种方法通过基于吸引子的重写器实现此目标,即向不同副本注入不同的吸引子,使不同副本产生不同的对抗区域,从而保证在一个副本上生成的对抗样本无法在其他副本上复现。本文聚焦于多个恶意买家联合发起的共谋攻击场景。我们首先给出两种形式化表述并通过实证研究,分析在攻击者能力与吸引子特性的不同假设下共谋攻击的有效性。观察发现,现有基于吸引子的方法无法有效误导共谋者:随着共谋者数量增加,对抗样本受原始模型的影响大于吸引子。基于此观察,我们提出采用自适应吸引子,其权重由U形曲线引导以弥补上述不足。实验结果表明,使用本方法后,即使大量副本被用于共谋攻击,共谋攻击的成功率仍收敛至约15%。相比之下,使用固定权重的现有基于吸引子的重写器时,共谋攻击成功率随共谋所用副本数量线性增长。