Adversarial attacks significantly threaten the robustness of deep neural networks (DNNs). Despite the multiple defensive methods employed, they are nevertheless vulnerable to poison attacks, where attackers meddle with the initial training data. In order to defend DNNs against such adversarial attacks, this work proposes a novel method that combines the defensive distillation mechanism with a denoising autoencoder (DAE). This technique tries to lower the sensitivity of the distilled model to poison attacks by spotting and reconstructing poisonous adversarial inputs in the training data. We added carefully created adversarial samples to the initial training data to assess the proposed method's performance. Our experimental findings demonstrate that our method successfully identified and reconstructed the poisonous inputs while also considering enhancing the DNN's resilience. The proposed approach provides a potent and robust defense mechanism for DNNs in various applications where data poisoning attacks are a concern. Thus, the defensive distillation technique's limitation posed by poisonous adversarial attacks is overcome.
翻译:对抗性攻击严重威胁深度神经网络(DNN)的鲁棒性。尽管采用了多种防御方法,但DNN仍然容易受到投毒攻击——攻击者通过篡改初始训练数据实施攻击。为了防御此类对抗性攻击,本文提出一种将防御性蒸馏机制与去噪自编码器(DAE)相结合的新方法。该技术通过检测并重构训练数据中的有毒对抗性输入,降低蒸馏模型对投毒攻击的敏感度。我们在初始训练数据中添加精心构建的对抗样本以评估所提方法的性能。实验结果表明,我们的方法在增强DNN鲁棒性的同时,成功识别并重构了有毒输入。所提方法为数据投毒攻击风险较高的各类应用场景提供了强效且鲁棒的DNN防御机制,从而克服了防御性蒸馏技术受有毒对抗性攻击制约的局限性。