Vertical federated learning (VFL) system has recently become prominent as a concept to process data distributed across many individual sources without the need to centralize it. Multiple participants collaboratively train models based on their local data in a privacy-aware manner. To date, VFL has become a de facto solution to securely learn a model among organizations, allowing knowledge to be shared without compromising privacy of any individuals. Despite the prosperous development of VFL systems, we find that certain inputs of a participant, named adversarial dominating inputs (ADIs), can dominate the joint inference towards the direction of the adversary's will and force other (victim) participants to make negligible contributions, losing rewards that are usually offered regarding the importance of their contributions in federated learning scenarios. We conduct a systematic study on ADIs by first proving their existence in typical VFL systems. We then propose gradient-based methods to synthesize ADIs of various formats and exploit common VFL systems. We further launch greybox fuzz testing, guided by the saliency score of ``victim'' participants, to perturb adversary-controlled inputs and systematically explore the VFL attack surface in a privacy-preserving manner. We conduct an in-depth study on the influence of critical parameters and settings in synthesizing ADIs. Our study reveals new VFL attack opportunities, promoting the identification of unknown threats before breaches and building more secure VFL systems.
翻译:垂直联邦学习系统作为一种新兴概念,近年来备受关注,其能够在无需集中化数据的情况下处理分布于多个独立数据源的信息。多个参与方以隐私保护的方式基于本地数据协作训练模型。迄今为止,VFL已成为跨机构安全学习模型的事实标准解决方案,使得知识共享成为可能,同时不损害任何个体的隐私。尽管VFL系统蓬勃发展,我们发现参与方的某些输入——称为对抗性主导输入——能够主导联合推理过程,使其朝着对抗方期望的方向发展,并迫使其他(受害)参与方做出可忽略的贡献,从而丧失联邦学习场景中通常根据贡献重要性分配的奖励。我们首先通过证明ADI在典型VFL系统中的存在性,对其展开了系统性研究。随后提出基于梯度的方法,以合成多种格式的ADI并利用常见VFL系统。我们进一步发起灰盒模糊测试,以受害参与方的显著性得分为指导,扰动对抗方控制的输入,并以隐私保护的方式系统性地探索VFL攻击面。我们对合成ADI过程中关键参数与设置的影响进行了深入研究。本研究揭示了VFL系统新的攻击可能性,有助于在安全漏洞发生前识别未知威胁,并构建更安全的VFL系统。