Deep learning technology has brought convenience and advanced developments but has become untrustworthy due to its sensitivity to adversarial attacks. Attackers may utilize this sensitivity to manipulate predictions. To defend against such attacks, existing anti-adversarial methods typically counteract adversarial perturbations post-attack, while we have devised a proactive strategy that preempts by safeguarding media upfront, effectively neutralizing potential adversarial effects before the third-party attacks occur. This strategy, dubbed Fast Preemption, provides an efficient transferable preemptive defense by using different models for labeling inputs and learning crucial features. A forward-backward cascade learning algorithm is used to compute protective perturbations, starting with forward propagation optimization to achieve rapid convergence, followed by iterative backward propagation learning to alleviate overfitting. This strategy offers state-of-the-art transferability and protection across various systems. With the running of only three steps, our Fast Preemption framework outperforms benchmark training-time, test-time, and preemptive adversarial defenses. We have also devised the first, to our knowledge, effective white-box adaptive reversion attack and demonstrate that the protection added by our defense strategy is irreversible unless the backbone model, algorithm, and settings are fully compromised. This work provides a new direction to developing proactive defenses against adversarial attacks.
翻译:深度学习技术带来了便利与先进发展,但由于其对对抗攻击的敏感性,已变得不可信赖。攻击者可能利用这种敏感性来操纵预测结果。为防御此类攻击,现有的抗对抗方法通常在攻击发生后对抗对抗性扰动,而我们设计了一种主动策略,通过预先保护媒体实现抢占,从而在第三方攻击发生前有效中和潜在的对抗效应。该策略称为“快速抢占”,通过使用不同模型标注输入并学习关键特征,提供了一种高效可迁移的抢占式防御。采用前向-后向级联学习算法计算保护性扰动:首先通过前向传播优化实现快速收敛,随后通过迭代后向传播学习缓解过拟合。该策略在各种系统中提供了最先进的迁移性与保护能力。仅需运行三步,我们的快速抢占框架在训练时防御、测试时防御及抢占式对抗防御的基准测试中均表现优异。我们还首次设计了一种有效的白盒自适应逆转攻击(据我们所知),并证明除非骨干模型、算法及设置完全被破解,否则本防御策略所添加的保护是不可逆的。本研究为开发针对对抗攻击的主动防御提供了新方向。