The design of data markets has gained importance as firms increasingly use machine learning models fueled by externally acquired training data. A key consideration is the externalities firms face when data, though inherently freely replicable, is allocated to competing firms. In this setting, we demonstrate that a data seller's optimal revenue increases as firms can pay to prevent allocations to others. To do so, we first reduce the combinatorial problem of allocating and pricing multiple datasets to the auction of a single digital good by modeling utility for data through the increase in prediction accuracy it provides. We then derive welfare and revenue maximizing mechanisms, highlighting how the form of firms' private information - whether the externalities one exerts on others is known, or vice-versa - affects the resulting structures. In all cases, under appropriate assumptions, the optimal allocation rule is a single threshold per firm, where either all data is allocated or none is.
翻译:数据市场的设计日益重要,因为企业越来越多地使用由外部获取的训练数据驱动的机器学习模型。一个关键考量是,当数据虽本质上可自由复制,却分配给竞争性企业时,企业所面临的外部性。在此设定下,我们证明,当企业可以付费阻止数据分配给其他企业时,数据卖家的最优收益会增加。为此,我们首先将多个数据集的分配与定价这一组合问题简化为单一数字商品的拍卖,通过用数据带来的预测准确率提升来建模其效用。随后,我们推导出福利最大化和收益最大化的机制,并强调企业私有信息的形式——即某方对他人施加的外部性是否已知,或反之——如何影响最终结构。在所有情况下,在适当假设下,最优分配规则是每家企业一个单一阈值,要么分配全部数据,要么不分配任何数据。