Launching effective malicious attacks in VFL presents unique challenges: 1) Firstly, given the distributed nature of clients' data features and models, each client rigorously guards its privacy and prohibits direct querying, complicating any attempts to steal data; 2) Existing malicious attacks alter the underlying VFL training task, and are hence easily detected by comparing the received gradients with the ones received in honest training. To overcome these challenges, we develop URVFL, a novel attack strategy that evades current detection mechanisms. The key idea is to integrate a discriminator with auxiliary classifier that takes a full advantage of the label information and generates malicious gradients to the victim clients: on one hand, label information helps to better characterize embeddings of samples from distinct classes, yielding an improved reconstruction performance; on the other hand, computing malicious gradients with label information better mimics the honest training, making the malicious gradients indistinguishable from the honest ones, and the attack much more stealthy. Our comprehensive experiments demonstrate that URVFL significantly outperforms existing attacks, and successfully circumvents SOTA detection methods for malicious attacks. Additional ablation studies and evaluations on defenses further underscore the robustness and effectiveness of URVFL. Our code will be available at https://github.com/duanyiyao/URVFL.
翻译:在纵向联邦学习(VFL)中发起有效的恶意攻击面临独特挑战:1)首先,鉴于客户端数据特征和模型的分布式特性,每个客户端严格保护其隐私并禁止直接查询,这使得窃取数据的尝试变得复杂;2)现有的恶意攻击会改变底层VFL训练任务,因此通过比较接收到的梯度与诚实训练中接收的梯度即可轻易被检测。为克服这些挑战,我们开发了URVFL——一种规避当前检测机制的新型攻击策略。其核心思想是集成一个带有辅助分类器的判别器,该判别器充分利用标签信息并向受害客户端生成恶意梯度:一方面,标签信息有助于更好地区分不同类别样本的嵌入表示,从而提升重构性能;另一方面,利用标签信息计算恶意梯度能更逼真地模拟诚实训练,使得恶意梯度与诚实梯度无法区分,从而大幅提升攻击的隐蔽性。我们的综合实验表明,URVFL显著优于现有攻击方法,并成功规避了针对恶意攻击的最先进检测方法。额外的消融研究及防御评估进一步凸显了URVFL的鲁棒性与有效性。代码发布于https://github.com/duanyiyao/URVFL。