The design of data markets has gained importance as firms increasingly use machine learning models fueled by externally acquired training data. A key consideration is the externalities firms face when data, though inherently freely replicable, is allocated to competing firms. In this setting, we demonstrate that a data seller's optimal revenue increases as firms can pay to prevent allocations to others. To do so, we first reduce the combinatorial problem of allocating and pricing multiple datasets to the auction of a single digital good by modeling utility for data through the increase in prediction accuracy it provides. We then derive welfare and revenue maximizing mechanisms, highlighting how the form of firms' private information - whether the externalities one exerts on others is known, or vice-versa - affects the resulting structures. In all cases, under appropriate assumptions, the optimal allocation rule is a single threshold per firm, where either all data is allocated or none is.
翻译:随着企业日益依赖外部获取的训练数据来驱动机器学习模型,数据市场的设计变得愈发重要。一个关键考量是,当数据(尽管本质上是可自由复制的)被分配给相互竞争的企业时,它们所面临的外部性。在此背景下,我们证明,当企业能够通过付费来阻止数据分配给其他方时,数据卖家的最优收益会增加。为此,我们首先通过将数据效用建模为其所提供的预测准确性提升,将分配和定价多个数据集的组合问题简化为单一数字商品的拍卖。随后,我们推导出福利最大化和收益最大化的机制,重点阐述了企业私人信息的形式——即企业施加于他人的外部性是已知的,还是反之——如何影响最终形成的机制结构。在所有情况下,在适当的假设下,最优分配规则是每个企业对应一个单一阈值,即要么分配全部数据,要么不分配任何数据。