On the Robustness of Split Learning against Adversarial Attacks

Split learning enables collaborative deep learning model training while preserving data privacy and model security by avoiding direct sharing of raw data and model details (i.e., sever and clients only hold partial sub-networks and exchange intermediate computations). However, existing research has mainly focused on examining its reliability for privacy protection, with little investigation into model security. Specifically, by exploring full models, attackers can launch adversarial attacks, and split learning can mitigate this severe threat by only disclosing part of models to untrusted servers.This paper aims to evaluate the robustness of split learning against adversarial attacks, particularly in the most challenging setting where untrusted servers only have access to the intermediate layers of the model.Existing adversarial attacks mostly focus on the centralized setting instead of the collaborative setting, thus, to better evaluate the robustness of split learning, we develop a tailored attack called SPADV, which comprises two stages: 1) shadow model training that addresses the issue of lacking part of the model and 2) local adversarial attack that produces adversarial examples to evaluate.The first stage only requires a few unlabeled non-IID data, and, in the second stage, SPADV perturbs the intermediate output of natural samples to craft the adversarial ones. The overall cost of the proposed attack process is relatively low, yet the empirical attack effectiveness is significantly high, demonstrating the surprising vulnerability of split learning to adversarial attacks.

翻译：分割学习通过避免直接共享原始数据和模型细节（即服务器与客户端仅持有部分子网络并交换中间计算结果），能够在保护数据隐私和模型安全的同时实现协作式深度学习模型训练。然而，现有研究主要聚焦于其隐私保护的可靠性，对模型安全性的探索十分有限。具体而言，攻击者可通过探索完整模型发动对抗攻击，而分割学习通过仅向不可信服务器公开部分模型，能够缓解这一严重威胁。本文旨在评估分割学习对抗对抗攻击的鲁棒性，尤其关注最具有挑战性的场景：不可信服务器仅能访问模型的中间层。现有的对抗攻击大多聚焦于集中式场景而非协作式场景，因此，为更好评估分割学习的鲁棒性，我们设计了一种定制化攻击方法——SPADV，该方法包含两个阶段：1）影子模型训练，用于解决模型部分缺失的问题；2）局部对抗攻击，用于生成对抗样本以评估鲁棒性。第一阶段仅需少量无标注的非独立同分布数据；在第二阶段中，SPADV通过扰动自然样本的中间输出以构造对抗样本。所提攻击过程的总成本相对较低，但经验性攻击效果显著，揭示了分割学习在面对对抗攻击时具有令人惊讶的脆弱性。