Adversarial patch attacks pose a significant threat to the practical deployment of deep learning systems. However, existing research primarily focuses on image pre-processing defenses, which often result in reduced classification accuracy for clean images and fail to effectively counter physically feasible attacks. In this paper, we investigate the behavior of adversarial patches as anomalies within the distribution of image information and leverage this insight to develop a robust defense strategy. Our proposed defense mechanism utilizes a clustering-based technique called DBSCAN to isolate anomalous image segments, which is carried out by a three-stage pipeline consisting of Segmenting, Isolating, and Blocking phases to identify and mitigate adversarial noise. Upon identifying adversarial components, we neutralize them by replacing them with the mean pixel value, surpassing alternative replacement options. Our model-agnostic defense mechanism is evaluated across multiple models and datasets, demonstrating its effectiveness in countering various adversarial patch attacks in image classification tasks. Our proposed approach significantly improves accuracy, increasing from 38.8\% without the defense to 67.1\% with the defense against LaVAN and GoogleAp attacks, surpassing prominent state-of-the-art methods such as LGS (53.86\%) and Jujutsu (60\%)
翻译:对抗性补丁攻击对深度学习系统的实际部署构成了严重威胁。然而,现有研究主要关注图像预处理防御方法,这些方法往往导致干净图像的分类精度下降,且无法有效应对物理可实现的攻击。本文研究了对抗性补丁在图像信息分布中的异常行为,并利用这一见解开发了稳健的防御策略。我们提出的防御机制采用基于聚类的DBSCAN技术来隔离异常图像片段,通过包含分割、隔离和阻塞三个阶段的三步流水线来识别并消除对抗性噪声。识别出对抗性成分后,我们通过用平均像素值替换它们来中和其影响,这一替换方案优于其他替代方案。我们的模型无关防御机制在多个模型和数据集上进行了评估,证明了其在图像分类任务中抵御各类对抗性补丁攻击的有效性。该方法显著提升了分类精度,在LaVAN和GoogleAp攻击下,防御后的准确率从38.8%提升至67.1%,超过了LGS(53.86%)和Jujutsu(60%)等主流先进方法。