Split learning enables collaborative deep learning model training while preserving data privacy and model security by avoiding direct sharing of raw data and model details (i.e., sever and clients only hold partial sub-networks and exchange intermediate computations). However, existing research has mainly focused on examining its reliability for privacy protection, with little investigation into model security. Specifically, by exploring full models, attackers can launch adversarial attacks, and split learning can mitigate this severe threat by only disclosing part of models to untrusted servers.This paper aims to evaluate the robustness of split learning against adversarial attacks, particularly in the most challenging setting where untrusted servers only have access to the intermediate layers of the model.Existing adversarial attacks mostly focus on the centralized setting instead of the collaborative setting, thus, to better evaluate the robustness of split learning, we develop a tailored attack called SPADV, which comprises two stages: 1) shadow model training that addresses the issue of lacking part of the model and 2) local adversarial attack that produces adversarial examples to evaluate.The first stage only requires a few unlabeled non-IID data, and, in the second stage, SPADV perturbs the intermediate output of natural samples to craft the adversarial ones. The overall cost of the proposed attack process is relatively low, yet the empirical attack effectiveness is significantly high, demonstrating the surprising vulnerability of split learning to adversarial attacks.
翻译:拆分学习通过避免直接共享原始数据和模型细节(即服务器与客户端仅持有部分子网络并交换中间计算结果),实现了协作式深度学习模型训练,同时保障数据隐私与模型安全。然而,现有研究主要关注其隐私保护可靠性,对模型安全性的探究甚少。具体而言,攻击者可利用完整模型发起对抗攻击,而拆分学习通过仅向不可信服务器公开部分模型来缓解这一严重威胁。本文旨在评估拆分学习对抗对抗攻击的鲁棒性,尤其针对不可信服务器仅能访问模型中间层的最具挑战性场景。现有对抗攻击大多聚焦集中式场景而非协作式场景,因此,为更好评估拆分学习的鲁棒性,我们开发了定制化攻击方法SPADV,包含两个阶段:1)影子模型训练,解决模型部分缺失问题;2)局部对抗攻击,生成对抗样本进行评估。第一阶段仅需少量无标签非独立同分布数据,第二阶段中SPADV通过扰动自然样本的中间层输出来构造对抗样本。所提攻击过程总体成本较低,但实证攻击有效性极高,充分揭示了拆分学习对抗对抗攻击的惊人脆弱性。