ADI: Adversarial Dominating Inputs in Vertical Federated Learning Systems

Vertical federated learning (VFL) system has recently become prominent as a concept to process data distributed across many individual sources without the need to centralize it. Multiple participants collaboratively train models based on their local data in a privacy-aware manner. To date, VFL has become a de facto solution to securely learn a model among organizations, allowing knowledge to be shared without compromising privacy of any individuals. Despite the prosperous development of VFL systems, we find that certain inputs of a participant, named adversarial dominating inputs (ADIs), can dominate the joint inference towards the direction of the adversary's will and force other (victim) participants to make negligible contributions, losing rewards that are usually offered regarding the importance of their contributions in federated learning scenarios. We conduct a systematic study on ADIs by first proving their existence in typical VFL systems. We then propose gradient-based methods to synthesize ADIs of various formats and exploit common VFL systems. We further launch greybox fuzz testing, guided by the saliency score of ``victim'' participants, to perturb adversary-controlled inputs and systematically explore the VFL attack surface in a privacy-preserving manner. We conduct an in-depth study on the influence of critical parameters and settings in synthesizing ADIs. Our study reveals new VFL attack opportunities, promoting the identification of unknown threats before breaches and building more secure VFL systems.

翻译：垂直联邦学习（VFL）系统最近作为一种处理分散在多个独立数据源（无需集中化）的数据的概念而备受关注。多个参与者以隐私感知的方式基于其本地数据协同训练模型。迄今为止，VFL已成为组织间安全学习模型的事实标准解决方案，在无需损害任何个体隐私的情况下实现知识共享。尽管VFL系统发展迅速，但我们发现参与者中某些特定输入——称为对抗性主导输入（ADIs）——能够主导联合推理朝攻击者意图的方向发展，并迫使其他（受害者）参与者做出可忽略的贡献，从而使其在联邦学习场景中通常按其贡献重要性获得的奖励受损。我们通过首先证明典型VFL系统中ADIs的存在性，对ADIs进行了系统性研究。随后，我们提出基于梯度的方法来合成多种格式的ADIs，并利用常见VFL系统实施攻击。我们进一步开展灰盒模糊测试——以“受害者”参与者的显著性分数为引导——对攻击者控制的输入进行扰动，以隐私保护方式系统性地探索VFL攻击面。我们深入研究了合成ADIs过程中关键参数与设置的影响。本研究揭示了新的VFL攻击机会，有助于在漏洞发生前识别未知威胁，并构建更安全的VFL系统。

相关内容

联邦学习

关注 200

联邦学习（Federated Learning）是一种新兴的人工智能基础技术，在 2016 年由谷歌最先提出，原本用于解决安卓手机终端用户在本地更新模型的问题，其设计目标是在保障大数据交换时的信息安全、保护终端数据和个人数据隐私、保证合法合规的前提下，在多参与方或多计算结点之间开展高效率的机器学习。其中，联邦学习可使用的机器学习算法不局限于神经网络，还包括随机森林等重要算法。联邦学习有望成为下一代人工智能协同算法和协作网络的基础。

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日