Data Sharing with Endogenous Choices over Differential Privacy Levels

We study coalition formation for data sharing under differential privacy when agents have heterogeneous privacy costs. Each agent holds a sensitive data point and decides whether to participate in a data-sharing coalition and how much noise to add to their data. Privacy choices induce a fundamental trade-off: higher privacy reduces individual data-sharing costs but degrades data utility and statistical accuracy for the coalition. These choices generate externalities across agents, making both participation and privacy levels strategic. Our goal is to understand which coalitions are stable, how privacy choices shape equilibrium outcomes, and how decentralized data sharing compares to a centralized, socially optimal benchmark. We provide a comprehensive equilibrium analysis across a broad range of privacy-cost regimes, from decreasing costs (e.g., privacy amplification from pooling data) to increasing costs (e.g., greater exposure to privacy attacks in larger coalitions). We first characterize Nash equilibrium coalitions with endogenous privacy levels and show that equilibria may fail to exist and can be non-monotonic in problem parameters. We also introduce a weaker equilibrium notion called robust equilibrium (that allows more widespread equilibrium existence by equipping existing players in the coalition with the power to prevent or veto external players from joining) and fully characterize such equilibria. Finally, we analyze, for both Nash and robust equilibria, the efficiency relative to the social optimum in terms of social welfare and estimator accuracy. We derive bounds that depend sharply on the number of players, properties of the cost profile and how privacy costs scale with coalition size.

翻译：本文研究在智能体具有异质隐私成本的情况下，差分隐私约束下的数据共享联盟形成问题。每个智能体持有一条敏感数据，并决定是否参与数据共享联盟以及向自身数据添加多少噪声。隐私选择引发了一个根本性权衡：更高的隐私保护会降低个体数据共享成本，但同时会削弱联盟的数据效用与统计准确性。这些选择在智能体间产生外部性，使得参与决策与隐私水平均成为策略性选择。我们的目标是理解哪些联盟是稳定的、隐私选择如何塑造均衡结果，以及去中心化数据共享与集中式社会最优基准的比较。我们在从递减成本（例如数据汇集产生的隐私放大效应）到递增成本（例如更大联盟面临更高隐私攻击风险）的广泛隐私成本机制下，提供了全面的均衡分析。我们首先刻画了具有内生隐私水平的纳什均衡联盟，并证明均衡可能不存在，且可能随问题参数呈非单调性。我们还引入了一种称为稳健均衡的较弱均衡概念（通过赋予联盟内现有成员阻止或否决外部成员加入的权力，使得均衡更广泛存在），并完整刻画了此类均衡。最后，我们针对纳什均衡与稳健均衡，从社会福利和估计量准确性的角度分析了相对于社会最优的效率。我们推导出的界限严格依赖于参与者数量、成本分布特性以及隐私成本随联盟规模变化的规律。