We consider a dynamic pricing problem where customer response to the current price is impacted by the customer price expectation, aka reference price. We study a simple and novel reference price mechanism where reference price is the average of the past prices offered by the seller. As opposed to the more commonly studied exponential smoothing mechanism, in our reference price mechanism the prices offered by seller have a longer term effect on the future customer expectations. We show that under this mechanism, a markdown policy is near-optimal irrespective of the parameters of the model. This matches the common intuition that a seller may be better off by starting with a higher price and then decreasing it, as the customers feel like they are getting bargains on items that are ordinarily more expensive. For linear demand models, we also provide a detailed characterization of the near-optimal markdown policy along with an efficient way of computing it. We then consider a more challenging dynamic pricing and learning problem, where the demand model parameters are apriori unknown, and the seller needs to learn them online from the customers' responses to the offered prices while simultaneously optimizing revenue. The objective is to minimize regret, i.e., the $T$-round revenue loss compared to a clairvoyant optimal policy. This task essentially amounts to learning a non-stationary optimal policy in a time-variant Markov Decision Process (MDP). For linear demand models, we provide an efficient learning algorithm with an optimal $\tilde{O}(\sqrt{T})$ regret upper bound.
翻译:我们考虑一个动态定价问题,其中客户对当前价格的响应受到客户价格预期(即参考价格)的影响。我们研究一种简单新颖的参考价格机制,其中参考价格是卖家过去提供价格的平均值。与更常研究的指数平滑机制相反,在我们的参考价格机制中,卖家提供的价格对未来客户预期具有长期影响。我们表明,在该机制下,降价策略无论模型参数如何都是近似最优的。这符合常见直觉:卖家从高价起步后逐步降价可能更有利,因为客户会觉得自己在以通常更贵的商品上获得了折扣。对于线性需求模型,我们还提供了近似最优降价策略的详细特征化描述及其高效计算方法。随后我们考虑更具挑战性的动态定价与学习问题,其中需求模型参数先验未知,卖家需要在根据客户对报价的响应进行在线学习的同时优化收益。目标是使遗憾值最小化,即与先知型最优策略相比的T轮收益损失。该任务本质上是在时变马尔可夫决策过程中学习非平稳最优策略。对于线性需求模型,我们提供了一种高效学习算法,其遗憾值上界达到最优的$\tilde{O}(\sqrt{T})$量级。