We study a two-sided online data ecosystem comprised of an online platform, users on the platform, and downstream learners or data buyers. The learners can buy user data on the platform (to run a statistic or machine learning task). Potential users decide whether to join by looking at the trade-off between i) their benefit from joining the platform and interacting with other users and ii) the privacy costs they incur from sharing their data. First, we introduce a novel modeling element for two-sided data platforms: the privacy costs of the users are endogenous and depend on how much of their data is purchased by the downstream learners. Then, we characterize marketplace equilibria in certain simple settings. In particular, we provide a full characterization in two variants of our model that correspond to different utility functions for the users: i) when each user gets a constant benefit for participating in the platform and ii) when each user's benefit is linearly increasing in the number of other users that participate. In both variants, equilibria in our setting are significantly different from equilibria when privacy costs are exogenous and fixed, highlighting the importance of taking endogeneity in the privacy costs into account. Finally, we provide simulations and semi-synthetic experiments to extend our results to more general assumptions. We experiment with different distributions of users' privacy costs and different functional forms of the users' utilities for joining the platform.
翻译:我们研究一个由在线平台、平台用户以及下游学习器或数据买家构成的双边在线数据生态系统。学习器可以在平台上购买用户数据(用于执行统计或机器学习任务)。潜在用户通过权衡以下两方面决定是否加入:i) 加入平台并与其他用户互动所获得的收益,以及ii) 共享数据所产生的隐私成本。首先,我们为双边数据平台引入了一个新颖的建模要素:用户的隐私成本是内生的,取决于下游学习器购买其数据的程度。然后,我们刻画了某些简单场景下的市场均衡特征。特别地,我们在模型的两种变体中提供了完整的刻画,这两种变体对应不同的用户效用函数:i) 当每个用户参与平台获得固定收益时,以及ii) 当每个用户的收益随参与用户数量线性增加时。在两种变体中,我们场景下的均衡与隐私成本外生固定时的均衡显著不同,凸显了考虑隐私成本内生性的重要性。最后,我们通过模拟和半合成实验将结果推广到更一般的假设,并实验了不同的用户隐私成本分布以及用户加入平台的不同效用函数形式。