Split Learning (SL) offers a framework for collaborative model training that respects data privacy by allowing participants to share the same dataset while maintaining distinct feature sets. However, SL is susceptible to backdoor attacks, in which malicious clients subtly alter their embeddings to insert hidden triggers that compromise the final trained model. To address this vulnerability, we introduce SecureSplit, a defense mechanism tailored to SL. SecureSplit applies a dimensionality transformation strategy to accentuate subtle differences between benign and poisoned embeddings, facilitating their separation. With this enhanced distinction, we develop an adaptive filtering approach that uses a majority-based voting scheme to remove contaminated embeddings while preserving clean ones. Rigorous experiments across four datasets (CIFAR-10, MNIST, CINIC-10, and ImageNette), five backdoor attack scenarios, and seven alternative defenses confirm the effectiveness of SecureSplit under various challenging conditions.
翻译:拆分学习(SL)提供了一种尊重数据隐私的协作模型训练框架,允许参与者共享同一数据集,同时保持各自的特征集。然而,SL 容易受到后门攻击,恶意客户端会微妙地修改其嵌入,以植入隐藏触发器,从而破坏最终训练好的模型。为应对此漏洞,我们提出了 SecureSplit,一种专为 SL 设计的防御机制。SecureSplit 采用维度变换策略,以放大良性嵌入与中毒嵌入之间的细微差异,从而促进二者的分离。基于这种增强的区分度,我们开发了一种自适应过滤方法,该方法采用基于多数的投票方案来移除受污染的嵌入,同时保留干净的嵌入。在四个数据集(CIFAR-10、MNIST、CINIC-10 和 ImageNette)、五种后门攻击场景以及七种替代防御方案上的严格实验证实,SecureSplit 在各种挑战性条件下均具有有效性。