The main premise of federated learning is that machine learning model updates are computed locally, in particular to preserve user data privacy, as those never leave the perimeter of their device. This mechanism supposes the general model, once aggregated, to be broadcast to collaborating and non malicious nodes. However, without proper defenses, compromised clients can easily probe the model inside their local memory in search of adversarial examples. For instance, considering image-based applications, adversarial examples consist of imperceptibly perturbed images (to the human eye) misclassified by the local model, which can be later presented to a victim node's counterpart model to replicate the attack. To mitigate such malicious probing, we introduce Pelta, a novel shielding mechanism leveraging trusted hardware. By harnessing the capabilities of Trusted Execution Environments (TEEs), Pelta masks part of the back-propagation chain rule, otherwise typically exploited by attackers for the design of malicious samples. We evaluate Pelta on a state of the art ensemble model and demonstrate its effectiveness against the Self Attention Gradient adversarial Attack.
翻译:摘要:联邦学习的主要前提是机器学习模型更新在本地计算,特别是为了保护用户数据隐私,因为这些数据永远不会离开其设备边界。该机制假设聚合后的通用模型会被广播给协作且非恶意的节点。然而,在没有适当防御的情况下,受损客户端可以轻易地在本地内存中探测模型以寻找对抗样本。例如,在基于图像的应用中,对抗样本包括被本地模型错误分类的(对人眼而言)不可察觉的扰动图像,这些样本随后可被呈现给受害节点的对应模型以复制攻击。为缓解此类恶意探测,我们提出Pelta,一种利用可信硬件的新型屏蔽机制。通过利用可信执行环境(TEE)的能力,Pelta掩盖了反向传播链规则的一部分,而该规则通常被攻击者用于设计恶意样本。我们在最先进的集成模型上评估了Pelta,并证明了其对抗自注意力梯度对抗攻击的有效性。