We consider a dynamic pricing problem where customer response to the current price is impacted by the customer price expectation, aka reference price. We study a simple and novel reference price mechanism where reference price is the average of the past prices offered by the seller. As opposed to the more commonly studied exponential smoothing mechanism, in our reference price mechanism the prices offered by seller have a longer term effect on the future customer expectations. We show that under this mechanism, a markdown policy is near-optimal irrespective of the parameters of the model. This matches the common intuition that a seller may be better off by starting with a higher price and then decreasing it, as the customers feel like they are getting bargains on items that are ordinarily more expensive. For linear demand models, we also provide a detailed characterization of the near-optimal markdown policy along with an efficient way of computing it. We then consider a more challenging dynamic pricing and learning problem, where the demand model parameters are apriori unknown, and the seller needs to learn them online from the customers' responses to the offered prices while simultaneously optimizing revenue. The objective is to minimize regret, i.e., the $T$-round revenue loss compared to a clairvoyant optimal policy. This task essentially amounts to learning a non-stationary optimal policy in a time-variant Markov Decision Process (MDP). For linear demand models, we provide an efficient learning algorithm with an optimal $\tilde{O}(\sqrt{T})$ regret upper bound.
翻译:我们研究一个动态定价问题,其中顾客对当前价格的响应受到其价格预期(即参考价格)的影响。我们探讨一种新颖且简洁的参考价格形成机制,其中参考价格定义为卖方过往报价的算术平均值。与更常研究的指数平滑机制不同,在我们的参考价格机制中,卖方提供的价格会对未来顾客预期产生更长期的影响。我们证明,在此机制下,无论模型参数如何,降价策略均接近最优。这与常见直觉相符:卖方从较高初始价格开始逐步降价可能更有利,因为顾客会感觉他们以优惠价格购得了通常更昂贵的商品。针对线性需求模型,我们进一步详细刻画了接近最优的降价策略,并提供了高效的计算方法。随后,我们研究更具挑战性的动态定价与学习问题:需求模型参数先验未知,卖方需要在通过顾客对报价的响应在线学习参数的同时优化收益。目标是最小化遗憾值,即与全知最优策略相比的$T$轮收益损失。该任务本质上相当于在时变马尔可夫决策过程(MDP)中学习非平稳最优策略。对于线性需求模型,我们提出一种高效学习算法,其遗憾值上界达到最优的$\tilde{O}(\sqrt{T})$。