Split learning is a collaborative learning design that allows several participants (clients) to train a shared model while keeping their datasets private. Recent studies demonstrate that collaborative learning models, specifically federated learning, are vulnerable to security and privacy attacks such as model inference and backdoor attacks. Backdoor attacks are a group of poisoning attacks in which the attacker tries to control the model output by manipulating the model's training process. While there have been studies regarding inference attacks on split learning, it has not yet been tested for backdoor attacks. This paper performs a novel backdoor attack on split learning and studies its effectiveness. Despite traditional backdoor attacks done on the client side, we inject the backdoor trigger from the server side. For this purpose, we provide two attack methods: one using a surrogate client and another using an autoencoder to poison the model via incoming smashed data and its outgoing gradient toward the innocent participants. We did our experiments using three model architectures and three publicly available datasets in the image domain and ran a total of 761 experiments to evaluate our attack methods. The results show that despite using strong patterns and injection methods, split learning is highly robust and resistant to such poisoning attacks. While we get the attack success rate of 100% as our best result for the MNIST dataset, in most of the other cases, our attack shows little success when increasing the cut layer.
翻译:分割学习是一种协作学习设计,允许多个参与者(客户端)在保持各自数据集私密的同时训练共享模型。近期研究表明,协作学习模型(特别是联邦学习)容易受到安全与隐私攻击,例如模型推断和后门攻击。后门攻击是一类投毒攻击,攻击者试图通过操控模型训练过程来控制模型输出。尽管已有针对分割学习推断攻击的研究,但其后门攻击的测试尚未展开。本文创新性地对分割学习执行后门攻击,并研究其有效性。不同于传统客户端侧的后门攻击,我们通过服务端注入后门触发器。为此,我们提出两种攻击方法:一种利用代理客户端,另一种利用自编码器,通过传入的碎片化数据及其向无辜参与者传回的梯度对模型投毒。我们在图像领域采用三种模型架构和三个公开数据集进行实验,共运行761次实验以评估攻击方法。结果表明,尽管采用了强模式与注入方法,分割学习对此类投毒攻击仍具有高度鲁棒性和抗性。虽然我们在MNIST数据集上实现了100%的最佳攻击成功率,但在多数其他场景中,随着切割层增加,攻击成功率显著下降。