Machine learning is a powerful tool enabling full automation of a huge number of tasks without explicit programming. Despite recent progress of machine learning in different domains, these models have shown vulnerabilities when they are exposed to adversarial threats. Adversarial threats aim to hinder the machine learning models from satisfying their objectives. They can create adversarial perturbations, which are imperceptible to humans' eyes but have the ability to cause misclassification during inference. In this paper, we propose a defense system, which devises an adversarial training module within mixture-of-experts architecture to enhance its robustness against white-box evasion attacks. In our proposed defense system, we use nine pre-trained classifiers (experts) with ResNet-18 as their backbone. During end-to-end training, the parameters of all experts and the gating mechanism are jointly updated allowing further optimization of the experts. Our proposed defense system outperforms prior MoE-based defenses under strong white-box FGSM and PGD evaluation on CIFAR-10 and SVHN. The use of multiple experts increases training time and compute relative to single-network baselines; however, inference scales approximately linearly with the number of experts and is substantially cheaper than training.
翻译:机器学习是一种无需显式编程即可实现大量任务全自动化的强大工具。尽管机器学习在不同领域取得了最新进展,但这些模型在面临对抗性威胁时展现出脆弱性。对抗性威胁旨在阻碍机器学习模型实现其目标。它们可以制造对人类视觉不可感知、但能在推理阶段引发分类错误的对抗扰动。本文提出了一种防御系统,该方案在混合专家架构中设计对抗训练模块,以增强其对白盒规避攻击的鲁棒性。在所提出的防御系统中,我们采用九个以ResNet-18为骨干网络的预训练分类器(专家)。在端到端训练过程中,所有专家参数与门控机制参数被联合更新,从而实现对专家的进一步优化。我们的防御系统在CIFAR-10和SVHN数据集上,面对强白盒FGSM和PGD攻击的评测中,性能优于此前基于MoE的防御方法。与单网络基线相比,多专家策略虽增加了训练时间和计算开销,但其推理效率随专家数量近似线性扩展,且显著低于训练成本。