Vertical federated learning (VFL) enables multiple parties with disjoint features of a common user set to train a machine learning model without sharing their private data. Tree-based models have become prevalent in VFL due to their interpretability and efficiency. However, the vulnerability of tree-based VFL has not been sufficiently investigated. In this study, we first introduce a novel label inference attack, ID2Graph, which utilizes the sets of record-IDs assigned to each node (i.e., instance space) to deduce private training labels. The ID2Graph attack generates a graph structure from training samples, extracts communities from the graph, and clusters the local dataset using community information. To counteract label leakage from the instance space, we propose an effective defense mechanism, ID-LMID, which prevents label leakage by focusing on mutual information regularization. Comprehensive experiments conducted on various datasets reveal that the ID2Graph attack presents significant risks to tree-based models such as Random Forest and XGBoost. Further evaluations on these benchmarks demonstrate that ID-LMID effectively mitigates label leakage in such instances.
翻译:纵向联邦学习(VFL)使多个拥有共同用户集但特征不重叠的参与方能够在不共享私有数据的情况下训练机器学习模型。基于树的模型因其可解释性和高效性在VFL中变得普遍。然而,树基VFL的脆弱性尚未得到充分研究。在本研究中,我们首先提出一种新型标签推断攻击——ID2Graph,该攻击利用分配给每个节点的记录ID集(即实例空间)来推断私有训练标签。ID2Graph攻击从训练样本中生成图结构,从图中提取社区,并利用社区信息对本地数据集进行聚类。为应对实例空间导致的标签泄露,我们提出一种有效的防御机制——ID-LMID,该机制通过聚焦互信息正则化来防止标签泄露。在多个数据集上进行的全面实验表明,ID2Graph攻击对随机森林和XGBoost等树基模型构成显著风险。对这些基准的进一步评估证明,ID-LMID能有效缓解此类实例中的标签泄露问题。