Driven by the rising value of data in applications such as advertising, finance, and machine learning, markets for data products have become increasingly important. Data markets mainly sell two kinds of products: datasets and machine learning models. Since these products can be replicated at negligible marginal cost, sellers naturally version them through query access and noisy model releases. Versioning immediately raises an arbitrage problem: a buyer may combine cheaper purchases and recover a more informative product at a lower total price. Existing work on query and model pricing studies arbitrage-freeness when buyer values are treated as exogenous, whereas the literature on selling information derives value from the buyer's decision problem but ignores arbitrage-freeness. Accordingly, we study the seller's optimal data pricing problem where buyers value data through Bayesian decision making and we impose arbitrage-freeness constraints. We first interpret query and model pricing as special cases of information pricing, and formulate the general arbitrage-free information selling problem, show the computational hardness and give a branch-and-bound algorithm based on McCormick relaxations. We then consider threshold utilities where buyers have a positive value if and only if the experiment is sufficiently informative. Under this condition, we find that the arbitrage-freeness can be characterized by Blackwell dominance, which in turn unifies the arbitrage-free conditions for query pricing \cite{deep2017design} and model pricing \cite{chen2019towards}. Finally, we characterize the revenue-maximizing pricing under restricted query and model menus.
翻译:受数据在广告、金融和机器学习等应用中价值不断攀升的驱动,数据产品市场日益重要。数据市场主要销售两类产品:数据集和机器学习模型。由于这些产品可以以可忽略的边际成本复制,卖方自然通过查询访问和带噪声的模型发布来对其进行版本化。版本化立即引发了一个套利问题:买方可能通过组合更便宜的购买,以更低的总价恢复信息更丰富的产品。现有关于查询和模型定价的研究在买方价值被视为外生时探讨无套利性,而销售信息的文献则从买方的决策问题中推导价值,但忽略了无套利性。因此,我们研究了卖方的最优数据定价问题,其中买方通过贝叶斯决策评价数据,并且我们施加了无套利约束。我们首先将查询定价和模型定价解释为信息定价的特例,并公式化了一般无套利信息销售问题,展示了其计算难度,并给出了一种基于McCormick松弛的分支定界算法。然后我们考虑了阈值效用,其中买方只有在实验足够信息时才具有正价值。在此条件下,我们发现无套利性可以通过Blackwell支配来刻画,这反过来统一了查询定价\cite{deep2017design}和模型定价\cite{chen2019towards}的无套利条件。最后,我们刻画了受限查询和模型菜单下的收入最大化定价。