Modern data aggregation often involves a platform collecting data from a network of users with various privacy options. Platforms must solve the problem of how to allocate incentives to users to convince them to share their data. This paper puts forth an idea for a \textit{fair} amount to compensate users for their data at a given privacy level based on an axiomatic definition of fairness, along the lines of the celebrated Shapley value. To the best of our knowledge, these are the first fairness concepts for data that explicitly consider privacy constraints. We also formulate a heterogeneous federated learning problem for the platform with privacy level options for users. By studying this problem, we investigate the amount of compensation users receive under fair allocations with different privacy levels, amounts of data, and degrees of heterogeneity. We also discuss what happens when the platform is forced to design fair incentives. Under certain conditions we find that when privacy sensitivity is low, the platform will set incentives to ensure that it collects all the data with the lowest privacy options. When the privacy sensitivity is above a given threshold, the platform will provide no incentives to users. Between these two extremes, the platform will set the incentives so some fraction of the users chooses the higher privacy option and the others chooses the lower privacy option.
翻译:现代数据聚合常涉及平台从具有不同隐私选项的用户网络中收集数据。平台必须解决如何向用户分配激励以说服其共享数据的问题。本文基于公平性的公理化定义(类似于著名的沙普利值),提出了一种在给定隐私水平下公平补偿用户数据的思路。据我们所知,这是首个明确考虑隐私约束的数据公平性概念。我们还针对平台构建了一个具有用户隐私级别选项的异构联邦学习问题。通过研究该问题,我们探讨了在公平分配下,用户因不同隐私级别、数据量和异构程度所获得的补偿金额。同时,我们分析了平台被迫设计公平激励时的情形。在特定条件下发现:当隐私敏感性较低时,平台会设置激励以确保收集所有最低隐私选项的数据;当隐私敏感性超过给定阈值时,平台将不向用户提供任何激励;介于这两个极端之间,平台会设置激励,使得部分用户选择较高隐私选项,其余用户选择较低隐私选项。