As a crucial building block in vertical Federated Learning (vFL), Split Learning (SL) has demonstrated its practice in the two-party model training collaboration, where one party holds the features of data samples and another party holds the corresponding labels. Such method is claimed to be private considering the shared information is only the embedding vectors and gradients instead of private raw data and labels. However, some recent works have shown that the private labels could be leaked by the gradients. These existing attack only works under the classification setting where the private labels are discrete. In this work, we step further to study the leakage in the scenario of the regression model, where the private labels are continuous numbers (instead of discrete labels in classification). This makes previous attacks harder to infer the continuous labels due to the unbounded output range. To address the limitation, we propose a novel learning-based attack that integrates gradient information and extra learning regularization objectives in aspects of model training properties, which can infer the labels under regression settings effectively. The comprehensive experiments on various datasets and models have demonstrated the effectiveness of our proposed attack. We hope our work can pave the way for future analyses that make the vFL framework more secure.
翻译:作为纵向联邦学习(vFL)中的关键构建模块,分割学习(SL)已在双方模型训练协作中展现出其实用性——其中一方持有数据样本的特征,另一方持有对应的标签。由于共享的信息仅为嵌入向量和梯度而非原始私有数据和标签,该方法被宣称具有隐私保护性。然而,近期研究表明私有标签可能通过梯度泄露。现有攻击仅适用于离散型私有标签的分类设置。本文进一步研究回归模型场景中的标签泄露问题——在此场景中私有标签为连续数值(而非分类任务中的离散标签)。由于连续标签具有无界输出范围,使得现有攻击难以推断。为突破这一局限,我们提出一种基于学习的新型攻击方法,该方法融合梯度信息与模型训练特性层面的额外学习正则化目标,能在回归设置下有效推断标签。针对多种数据集和模型的全面实验证明了所提攻击的有效性。我们期望本研究能为提升纵向联邦学习框架安全性的未来分析奠定基础。