Dynamic pricing models often posit that a $\textbf{stream}$ of customer interactions occur sequentially, where customers' valuations are drawn independently. However, this model is not entirely reflective of the real world, as it overlooks a critical aspect, the law of diminishing marginal utility, which states that a customer's marginal utility from each additional unit declines. This causes the valuation distribution to shift towards the lower end, which is not captured by the stream model. This motivates us to study a pool-based model, where a $\textbf{pool}$ of customers repeatedly interacts with a monopolist seller, each of whose valuation diminishes in the number of purchases made according to a discount function. In particular, when the discount function is constant, our pool model recovers the stream model. We focus on the most fundamental special case, where a customer's valuation becomes zero once a purchase is made. Given $k$ prices, we present a non-adaptive, detail-free (i.e., does not "know" the valuations) policy that achieves a $1/k$ competitive ratio, which is optimal among non-adaptive policies. Furthermore, based on a novel debiasing technique, we propose an adaptive learn-then-earn policy with a $\tilde O(k^{2/3} n^{2/3})$ regret.
翻译:动态定价模型通常假设客户交互按顺序进行,形成一条“流”,其中客户的估价独立抽取。然而,该模型并未完全反映现实世界,因为它忽略了一个关键方面——边际效用递减规律,即客户对每额外一单位商品的边际效用会下降。这导致估价分布向低端偏移,而流模型无法捕捉这一现象。这促使我们研究一种基于池的模型:一个“池”中的客户与垄断卖家重复交互,每个客户的估价会随购买次数按折扣函数递减。特别地,当折扣函数为常数时,我们的池模型退化为流模型。我们聚焦于最基本的特例,即客户在购买一次后估价变为零。在给定$k$个价格的情况下,我们提出了一种非自适应、无需细节(即无需“知晓”估价)的策略,其竞争比为$1/k$,这在非自适应策略中是最优的。此外,基于一种新颖的去偏技术,我们提出了一种自适应“先学习后收益”策略,其遗憾值为$\tilde O(k^{2/3} n^{2/3})$。