Federated learning promises significant sample-efficiency gains by pooling data across multiple agents, yet incentive misalignment is an obstacle: each update is costly to the contributor but boosts every participant. We introduce a game-theoretic framework that captures heterogeneous data: an agent's utility depends on who supplies each sample, not just how many. Agents aim to meet a PAC-style accuracy threshold at minimal personal cost. We show that uncoordinated play yields pathologies: pure equilibria may not exist, and the best equilibrium can be arbitrarily more costly than cooperation. To steer collaboration, we analyze the cost-minimizing contribution vector, prove that computing it is NP-hard, and derive a polynomial-time linear program that achieves a logarithmic approximation. Finally, pairing the LP with a simple pay what you contribute rule, where each agent receives a payment equal to its sample cost, yields a mechanism that is strategy-proof and, within the class of contribution-based transfers, is unique.
翻译:联邦学习通过汇集多个智能体的数据有望显著提升样本效率,但激励错位构成了障碍:每个更新对贡献者产生成本,却使所有参与者受益。我们提出了一个博弈论框架,该框架能够刻画异构数据特性:智能体的效用取决于每个样本由谁提供,而不仅仅是样本数量。智能体的目标是以最小个人成本达到PAC式精度阈值。我们证明非协调博弈会导致病态现象:纯策略均衡可能不存在,且最优均衡的成本可能任意高于合作成本。为引导协作,我们分析了成本最小化的贡献向量,证明了其计算是NP难的,并推导出一个可实现对数近似的多项式时间线性规划方法。最后,将该线性规划与“按贡献支付”规则(即每个智能体获得的报酬等于其样本成本)相结合,产生了一种策略证明机制,且在基于贡献的转移支付类别中具有唯一性。