Split Learning (SL) offers a framework for collaborative model training that respects data privacy by allowing participants to share the same dataset while maintaining distinct feature sets. However, SL is susceptible to backdoor attacks, in which malicious clients subtly alter their embeddings to insert hidden triggers that compromise the final trained model. To address this vulnerability, we introduce SecureSplit, a defense mechanism tailored to SL. SecureSplit applies a dimensionality transformation strategy to accentuate subtle differences between benign and poisoned embeddings, facilitating their separation. With this enhanced distinction, we develop an adaptive filtering approach that uses a majority-based voting scheme to remove contaminated embeddings while preserving clean ones. Rigorous experiments across four datasets (CIFAR-10, MNIST, CINIC-10, and ImageNette), five backdoor attack scenarios, and seven alternative defenses confirm the effectiveness of SecureSplit under various challenging conditions.
翻译:分割学习(SL)提供了一种尊重数据隐私的协作模型训练框架,允许参与者在保持各自特征集独立性的同时共享同一数据集。然而,SL容易受到后门攻击,恶意客户端可能通过细微修改其嵌入向量来植入隐藏触发器,从而破坏最终训练模型的安全性。为应对这一脆弱性,我们提出了专为SL设计的防御机制SecureSplit。该方法采用维度变换策略,以放大良性嵌入与受污染嵌入之间的细微差异,从而促进二者的有效分离。基于这种增强的区分度,我们开发了一种自适应过滤方法,利用基于多数的投票机制剔除受污染的嵌入向量,同时保留干净的嵌入。通过在四个数据集(CIFAR-10、MNIST、CINIC-10和ImageNette)、五种后门攻击场景以及七种替代防御方案上的严格实验,验证了SecureSplit在多种挑战性条件下的有效性。