Mixture of Experts (MoE) have shown remarkable success in leveraging specialized expert networks for complex machine learning tasks. However, their susceptibility to adversarial attacks presents a critical challenge for deployment in robust applications. This paper addresses the critical question of how to incorporate robustness into MoEs while maintaining high natural accuracy. We begin by analyzing the vulnerability of MoE components, finding that expert networks are notably more susceptible to adversarial attacks than the router. Based on this insight, we propose a targeted robust training technique that integrates a novel loss function to enhance the adversarial robustness of MoE, requiring only the robustification of one additional expert without compromising training or inference efficiency. Building on this, we introduce a dual-model strategy that linearly combines a standard MoE model with our robustified MoE model using a smoothing parameter. This approach allows for flexible control over the robustness-accuracy trade-off. We further provide theoretical foundations by deriving certified robustness bounds for both the single MoE and the dual-model. To push the boundaries of robustness and accuracy, we propose a novel joint training strategy JTDMoE for the dual-model. This joint training enhances both robustness and accuracy beyond what is achievable with separate models. Experimental results on CIFAR-10 and TinyImageNet datasets using ResNet18 and Vision Transformer (ViT) architectures demonstrate the effectiveness of our proposed methods. The code is publicly available at https://github.com/TIML-Group/Robust-MoE-Dual-Model.
翻译:专家混合模型在利用专业化专家网络处理复杂机器学习任务方面已展现出显著成效。然而,其对对抗性攻击的易感性成为其在鲁棒性关键应用中部署的主要障碍。本文旨在解决如何在保持高自然准确率的同时,将鲁棒性融入专家混合模型这一核心问题。我们首先分析了专家混合模型各组成部分的脆弱性,发现专家网络相较于路由网络对对抗性攻击更为敏感。基于这一洞察,我们提出了一种针对性的鲁棒训练技术,通过引入一种新颖的损失函数来增强专家混合模型的对抗鲁棒性,该方法仅需额外鲁棒化一个专家网络,且不影响训练或推理效率。在此基础上,我们提出了一种双模型策略,该策略使用平滑参数将标准专家混合模型与我们的鲁棒化专家混合模型进行线性组合。这一方法允许灵活控制鲁棒性与准确性之间的权衡。我们进一步通过推导单一专家混合模型及双模型的认证鲁棒性边界,为此提供了理论基础。为突破鲁棒性与准确性的极限,我们为双模型提出了一种新颖的联合训练策略 JTDMoE。这种联合训练能够同时提升鲁棒性与准确性,其效果超越了分别训练模型所能达到的水平。在 CIFAR-10 和 TinyImageNet 数据集上,使用 ResNet18 和 Vision Transformer 架构的实验结果验证了我们所提方法的有效性。代码已公开于 https://github.com/TIML-Group/Robust-MoE-Dual-Model。