While multi-exit neural networks are regarded as a promising solution for making efficient inference via early exits, combating adversarial attacks remains a challenging problem. In multi-exit networks, due to the high dependency among different submodels, an adversarial example targeting a specific exit not only degrades the performance of the target exit but also reduces the performance of all other exits concurrently. This makes multi-exit networks highly vulnerable to simple adversarial attacks. In this paper, we propose NEO-KD, a knowledge-distillation-based adversarial training strategy that tackles this fundamental challenge based on two key contributions. NEO-KD first resorts to neighbor knowledge distillation to guide the output of the adversarial examples to tend to the ensemble outputs of neighbor exits of clean data. NEO-KD also employs exit-wise orthogonal knowledge distillation for reducing adversarial transferability across different submodels. The result is a significantly improved robustness against adversarial attacks. Experimental results on various datasets/models show that our method achieves the best adversarial accuracy with reduced computation budgets, compared to the baselines relying on existing adversarial training or knowledge distillation techniques for multi-exit networks.
翻译:尽管多出口神经网络被视为通过早期出口实现高效推理的有前景方案,但对抗攻击的防御仍是极具挑战性的问题。在多出口网络中,由于不同子模型之间存在高度依赖性,针对特定出口的对抗样本不仅会降低目标出口的性能,还会同时影响所有其他出口的表现。这使得多出口网络极易遭受简单对抗攻击的破坏。本文提出NEO-KD——一种基于知识蒸馏的对抗训练策略,通过两项关键贡献解决这一根本性挑战。首先,NEO-KD采用邻域知识蒸馏技术,引导对抗样本的输出趋向于干净数据相邻出口的集成输出。其次,NEO-KD引入出口正交知识蒸馏方法,以降低不同子模型之间的对抗迁移性。最终实现了对对抗攻击的显著鲁棒性提升。在多种数据集/模型上的实验结果表明,与依赖现有对抗训练或知识蒸馏技术的多出口网络基线方法相比,本方法在降低计算开销的同时取得了最优的对抗精度。