We study dynamic pricing where a seller repeatedly interacts with a strategic, non-myopic buyer who has a fixed private valuation and discounts future utility. Prior work focused exclusively on posted-price mechanisms, where the seller gives a take-it-or-leave-it offer. For our first result, we show that menu mechanisms consisting of allocation-payment contracts achieve $O(T_γ)$ regret, where $T_γ$ is the buyer's effective discounted time horizon. We also establish a $Ω(T_γ)$ lower bound, demonstrating the bound is tight. Considering the geometric discounting buyer with a constant discount factor, our bound is $O(1)$, while prior bounds using posted-price mechanisms incur an unavoidable $Ω(\log\log T)$ factor in regret. Our second contribution is more conceptual in nature. The problem of dynamic pricing sits at the intersection of two paradigms: learning with strategic agents in computer science / machine learning and revelation-principle-based mechanism design in economics, yet their relationship has remained unclear. We establish a fundamental equivalence: indirect learning-based mechanisms and direct revelation mechanisms achieve identical optimal regret. The adaptive, data-driven algorithms of online learning and explicit type elicitation are two languages towards solving the same problem.
翻译:暂无翻译