The main premise of federated learning (FL) is that machine learning model updates are computed locally to preserve user data privacy. This approach avoids by design user data to ever leave the perimeter of their device. Once the updates aggregated, the model is broadcast to all nodes in the federation. However, without proper defenses, compromised nodes can probe the model inside their local memory in search for adversarial examples, which can lead to dangerous real-world scenarios. For instance, in image-based applications, adversarial examples consist of images slightly perturbed to the human eye getting misclassified by the local model. These adversarial images are then later presented to a victim node's counterpart model to replay the attack. Typical examples harness dissemination strategies such as altered traffic signs (patch attacks) no longer recognized by autonomous vehicles or seemingly unaltered samples that poison the local dataset of the FL scheme to undermine its robustness. Pelta is a novel shielding mechanism leveraging Trusted Execution Environments (TEEs) that reduce the ability of attackers to craft adversarial samples. Pelta masks inside the TEE the first part of the back-propagation chain rule, typically exploited by attackers to craft the malicious samples. We evaluate Pelta on state-of-the-art accurate models using three well-established datasets: CIFAR-10, CIFAR-100 and ImageNet. We show the effectiveness of Pelta in mitigating six white-box state-of-the-art adversarial attacks, such as Projected Gradient Descent, Momentum Iterative Method, Auto Projected Gradient Descent, the Carlini & Wagner attack. In particular, Pelta constitutes the first attempt at defending an ensemble model against the Self-Attention Gradient attack to the best of our knowledge. Our code is available to the research community at https://github.com/queyrusi/Pelta.
翻译:联邦学习(FL)的核心前提是,机器学习模型更新在本地计算以保护用户数据隐私。这种方法通过设计避免了用户数据离开设备边界。聚合更新后,模型会广播至联邦中的所有节点。然而,若缺乏适当防御,受损节点可在本地内存中探查模型以寻找对抗样本,这可能引发危险的现实场景。例如,在基于图像的应用中,对抗样本由人眼难以察觉的轻微扰动图像组成,导致本地模型误分类。这些对抗图像随后被呈现给受害节点的对应模型以重现攻击。典型策略利用诸如被篡改的交通标志(补丁攻击)导致自动驾驶车辆无法识别,或看似未改动的样本污染FL方案的本地数据集以削弱其鲁棒性。Pelta是一种新型屏蔽机制,利用可信执行环境(TEE)降低攻击者制作对抗样本的能力。Pelta在TEE内部屏蔽了反向传播链式法则的首个部分(该部分常被攻击者用于生成恶意样本)。我们使用三个公认数据集(CIFAR-10、CIFAR-100和ImageNet)在先进精确模型上评估Pelta。实验证明,Pelta能有效缓解六种白盒对抗攻击,包括投影梯度下降、动量迭代法、自动投影梯度下降以及Carlini & Wagner攻击。特别地,据我们所知,Pelta是首个针对集成模型防御自注意力梯度攻击的尝试。我们的代码已公开于研究社区:https://github.com/queyrusi/Pelta。