Arbitrage-free Data Pricing

Driven by the rising value of data in applications such as advertising, finance, and machine learning, markets for data products have become increasingly important. Data markets mainly sell two kinds of products: datasets and machine learning models. Since these products can be replicated at negligible marginal cost, sellers naturally version them through query access and noisy model releases. Versioning immediately raises an arbitrage problem: a buyer may combine cheaper purchases and recover a more informative product at a lower total price. Existing work on query and model pricing studies arbitrage-freeness when buyer values are treated as exogenous, whereas the literature on selling information derives value from the buyer's decision problem but ignores arbitrage-freeness. Accordingly, we study the seller's optimal data pricing problem where buyers value data through Bayesian decision making and we impose arbitrage-freeness constraints. We first interpret query and model pricing as special cases of information pricing, and formulate the general arbitrage-free information selling problem, show the computational hardness and give a branch-and-bound algorithm based on McCormick relaxations. We then consider threshold utilities where buyers have a positive value if and only if the experiment is sufficiently informative. Under this condition, we find that the arbitrage-freeness can be characterized by Blackwell dominance, which in turn unifies the arbitrage-free conditions for query pricing \cite{deep2017design} and model pricing \cite{chen2019towards}. Finally, we characterize the revenue-maximizing pricing under restricted query and model menus.

翻译：受数据在广告、金融和机器学习等应用中价值不断攀升的驱动，数据产品市场日益重要。数据市场主要销售两类产品：数据集和机器学习模型。由于这些产品可以以可忽略的边际成本复制，卖方自然通过查询访问和带噪声的模型发布来对其进行版本化。版本化立即引发了一个套利问题：买方可能通过组合更便宜的购买，以更低的总价恢复信息更丰富的产品。现有关于查询和模型定价的研究在买方价值被视为外生时探讨无套利性，而销售信息的文献则从买方的决策问题中推导价值，但忽略了无套利性。因此，我们研究了卖方的最优数据定价问题，其中买方通过贝叶斯决策评价数据，并且我们施加了无套利约束。我们首先将查询定价和模型定价解释为信息定价的特例，并公式化了一般无套利信息销售问题，展示了其计算难度，并给出了一种基于McCormick松弛的分支定界算法。然后我们考虑了阈值效用，其中买方只有在实验足够信息时才具有正价值。在此条件下，我们发现无套利性可以通过Blackwell支配来刻画，这反过来统一了查询定价\cite{deep2017design}和模型定价\cite{chen2019towards}的无套利条件。最后，我们刻画了受限查询和模型菜单下的收入最大化定价。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《数据要素市场》最新综述，103页pdf详述

专知会员服务

21+阅读 · 2024年11月17日

《数据价值化与数据要素市场发展报告（2024年）》下载

专知会员服务

35+阅读 · 2024年10月6日

《数据安全治理自动化技术框架》白皮书发布，47页pdf

专知会员服务

61+阅读 · 2022年9月9日

数据治理研究报告——数据要素权益配置路径（2022年），50页pdf

专知会员服务

43+阅读 · 2022年7月19日