We study a data pricing problem, where a seller has access to $N$ homogeneous data points (e.g. drawn i.i.d. from some distribution). There are $m$ types of buyers in the market, where buyers of the same type $i$ have the same valuation curve $v_i:[N]\rightarrow [0,1]$, where $v_i(n)$ is the value for having $n$ data points. A priori, the seller is unaware of the distribution of buyers, but can repeat the market for $T$ rounds so as to learn the revenue-optimal pricing curve $p:[N] \rightarrow [0, 1]$. To solve this online learning problem, we first develop novel discretization schemes to approximate any pricing curve. When compared to prior work, the size of our discretization schemes scales gracefully with the approximation parameter, which translates to better regret in online learning. Under assumptions like smoothness and diminishing returns which are satisfied by data, the discretization size can be reduced further. We then turn to the online learning problem, both in the stochastic and adversarial settings. On each round, the seller chooses an anonymous pricing curve $p_t$. A new buyer appears and may choose to purchase some amount of data. She then reveals her type only if she makes a purchase. Our online algorithms build on classical algorithms such as UCB and FTPL, but require novel ideas to account for the asymmetric nature of this feedback and to deal with the vastness of the space of pricing curves. Using the improved discretization schemes previously developed, we are able to achieve $\tilde{O}(m\sqrt{T})$ regret in the stochastic setting and $\tilde{O}(m^{3/2}\sqrt{T})$ regret in the adversarial setting.
翻译:我们研究一个数据定价问题,其中卖方拥有 $N$ 个同质数据点(例如从某个分布中独立同分布抽取)。市场存在 $m$ 类买方,同类 $i$ 的买方具有相同的估值曲线 $v_i:[N]\rightarrow [0,1]$,其中 $v_i(n)$ 表示拥有 $n$ 个数据点的价值。卖方事先不知道买方的分布,但可以在 $T$ 轮中重复市场以学习收益最优的定价曲线 $p:[N] \rightarrow [0, 1]$。为解决这一在线学习问题,我们首先开发了新颖的离散化方案来逼近任意定价曲线。与先前工作相比,我们的离散化方案规模随近似参数的变化更为平缓,这转化为在线学习中更优的遗憾界。在数据满足的平滑性和收益递减等假设下,离散化规模可进一步减小。随后我们转向在线学习问题,包括随机和对抗两种设置。每轮中卖方选择一个匿名定价曲线 $p_t$,新出现的买方可选择购买一定数量的数据。仅当买方进行购买时,她才会透露其类型。我们的在线算法建立在 UCB 和 FTPL 等经典算法基础上,但需要引入新思路以处理这种反馈的不对称性,并应对定价曲线空间的广阔性。利用先前开发的改进离散化方案,我们在随机设置中实现了 $\tilde{O}(m\sqrt{T})$ 的遗憾,在对抗设置中实现了 $\tilde{O}(m^{3/2}\sqrt{T})$ 的遗憾。