面向预算约束买家的数据市场收益最优定价 (Revenue-Optimal Pricing for Budget-Constrained Buyers in Data Markets)

We study revenue-optimal pricing in data markets with rational, budget-constrained buyers. Such a market offers multiple datasets for sale, and buyers aim to improve the accuracy of their prediction tasks by acquiring data bundles. For each dataset, the market sets a pricing function, which maps the number of records purchased from the dataset to a non-negative price. The market's objective is to set these pricing functions to maximize total revenue, considering that buyers with quasi-linear utilities choose their bundles optimally under budget constraints. We analyze optimal pricing when each dataset's pricing function is only required to be monotone and lower-continuous. Surprisingly, even with this generality, optimal pricing has a highly structured form: it is piecewise linear and convex (PLC) and can be computed efficiently via an LP. Moreover, the total number of kinks across all pricing functions is bounded by the number of buyers. Thus, when datasets far outnumber buyers, most pricing functions are effectively linear. This motivates studying linear pricing, where each record in a dataset is priced uniformly. Although competitive equilibrium gives revenue-optimal linear prices in rivalrous markets with quasi-linear buyers, we show that revenue maximization under linear pricing in data markets is APX-hard. Hence, a striking computational dichotomy emerges: fully general (nonlinear) pricing admits a polynomial-time algorithm, while the simpler linear scheme is APX-hard. Despite the hardness, we design a 2-approximation algorithm when datasets arrive online, and a $(1-1/e)^{-1}$-approximation algorithm for the offline setting. Our framework lays the groundwork for exploring more general pricing schemes, richer utility models, and a deeper understanding of how market structure -- rivalrous versus non-rivalrous -- shapes revenue-optimal pricing.

翻译：本研究探讨了在理性且预算受限的买家参与的数据市场中，如何实现收益最优的定价策略。此类市场提供多种待售数据集，买家旨在通过购买数据组合来提升其预测任务的准确性。对于每个数据集，市场设定一个定价函数，该函数将购买的数据记录数量映射为一个非负价格。市场的目标是在考虑买家具有拟线性效用且受预算约束下最优选择其数据组合的前提下，设定这些定价函数以实现总收益最大化。我们分析了仅要求每个数据集的定价函数单调且下半连续时的最优定价。令人惊讶的是，即使在这种一般性条件下，最优定价仍具有高度结构化的形式：它是分段线性且凸的，并可通过线性规划高效计算。此外，所有定价函数中“拐点”的总数受限于买家数量。因此，当数据集数量远多于买家时，大多数定价函数实际上呈线性。这促使我们研究线性定价，即对数据集中的每条记录进行统一定价。尽管在具有拟线性买家的竞争性市场中，竞争均衡给出了收益最优的线性价格，但我们证明了在数据市场中，线性定价下的收益最大化问题是APX难的。因此，一个显著的计算二分现象出现了：完全一般化的非线性定价存在多项式时间算法，而更简单的线性定价方案却是APX难的。尽管存在计算困难，我们为数据集在线到达的场景设计了一个2-近似算法，并为离线场景设计了一个$(1-1/e)^{-1}$-近似算法。我们的研究框架为探索更一般的定价方案、更丰富的效用模型以及更深入地理解市场结构——竞争性与非竞争性——如何影响收益最优定价奠定了基础。