The paper introduces a white-box attack on computer vision models using SHAP values. It demonstrates how adversarial evasion attacks can compromise the performance of deep learning models by reducing output confidence or inducing misclassifications. Such attacks are particularly insidious as they can deceive the perception of an algorithm while eluding human perception due to their imperceptibility to the human eye. The proposed attack leverages SHAP values to quantify the significance of individual inputs to the output at the inference stage. A comparison is drawn between the SHAP attack and the well-known Fast Gradient Sign Method. We find evidence that SHAP attacks are more robust in generating misclassifications particularly in gradient hiding scenarios.
翻译:本文提出了一种利用SHAP值对计算机视觉模型实施的白盒攻击。研究展示了对抗性规避攻击如何通过降低输出置信度或诱导误分类来损害深度学习模型的性能。此类攻击尤为隐蔽,因其对人类视觉的不可感知性,既能欺骗算法感知,又能规避人类察觉。所提出的攻击方法利用SHAP值在推理阶段量化单个输入对输出的贡献程度。研究将SHAP攻击与著名的快速梯度符号方法进行了对比分析。我们发现证据表明,SHAP攻击在生成误分类方面具有更强的鲁棒性,尤其在梯度隐藏场景中表现突出。