Federated Learning (FL) is an emerging machine learning paradigm that enables multiple parties to collaboratively train models without sharing raw data, ensuring data privacy. In Vertical FL (VFL), where each party holds different features for the same users, a key challenge is to evaluate the feature contribution of each party before any model is trained, particularly in the early stages when no model exists. To address this, the Shapley-CMI method was recently proposed as a model-free, information-theoretic approach to feature valuation using Conditional Mutual Information (CMI). However, its original formulation did not provide a practical implementation capable of computing the required permutations and intersections securely. This paper presents a novel privacy-preserving implementation of Shapley-CMI for VFL. Our system introduces a private set intersection (PSI) server that performs all necessary feature permutations and computes encrypted intersection sizes across discretized and encrypted ID groups, without the need for raw data exchange. Each party then uses these intersection results to compute Shapley-CMI values, computing the marginal utility of their features. Initial experiments confirm the correctness and privacy of the proposed system, demonstrating its viability for secure and efficient feature contribution estimation in VFL. This approach ensures data confidentiality, scales across multiple parties, and enables fair data valuation without requiring the sharing of raw data or training models.


翻译:联邦学习(Federated Learning, FL)是一种新兴的机器学习范式,允许多个参与方在不共享原始数据的情况下协同训练模型,从而确保数据隐私。在纵向联邦学习(Vertical FL, VFL)中,各参与方持有相同用户的不同特征,一个关键挑战是在任何模型训练之前(尤其是在早期尚无模型存在时)评估各参与方的特征贡献。为此,近期提出的Shapley-CMI方法作为一种无需模型的信息论特征价值评估方法,利用条件互信息(Conditional Mutual Information, CMI)实现评估。然而,其原始形式未能提供一种能够安全计算所需置换与交集的实际实现方案。本文提出了一种面向VFL的、具有隐私保护特性的Shapley-CMI新颖实现方案。我们的系统引入了一个私有集合交集(Private Set Intersection, PSI)服务器,该服务器执行所有必要的特征置换,并在离散化且加密的ID组上计算加密的交集大小,而无需交换原始数据。随后,各参与方利用这些交集结果计算Shapley-CMI值,以评估其特征的边际效用。初步实验验证了所提系统的正确性与隐私保护能力,证明了其在VFL中实现安全高效特征贡献评估的可行性。该方法确保了数据机密性,支持跨多参与方扩展,并能在无需共享原始数据或训练模型的情况下实现公平的数据价值评估。

0
下载
关闭预览

相关内容

VIP会员
最新内容
人工智能赋能无人机:俄乌战争(万字长文)
专知会员服务
4+阅读 · 今天6:56
国外海军作战管理系统与作战训练系统
专知会员服务
2+阅读 · 今天4:16
美军条令《海军陆战队规划流程(2026版)》
专知会员服务
9+阅读 · 今天3:36
《压缩式分布式交互仿真标准》120页
专知会员服务
4+阅读 · 今天3:21
《电子战数据交换模型研究报告》
专知会员服务
6+阅读 · 今天3:13
《基于Transformer的异常舰船导航识别与跟踪》80页
《低数据领域军事目标检测模型研究》
专知会员服务
6+阅读 · 今天2:37
【CMU博士论文】物理世界的视觉感知与深度理解
专知会员服务
10+阅读 · 4月22日
伊朗战争停火期间美军关键弹药状况分析
专知会员服务
8+阅读 · 4月22日
电子战革命:塑造战场的十年突破(2015–2025)
Top
微信扫码咨询专知VIP会员