We study dynamic pricing where a seller repeatedly interacts with a strategic, non-myopic buyer who has a fixed private valuation and discounts future utility. Prior work focused exclusively on posted-price mechanisms, which only extract binary accept/reject signals. For our first result, we show that menu mechanisms-offering allocation-payment contracts are able to achieve $O(T_γ\log T_γ)$ regret, where $T_γ$ is the buyer's effective discounted time horizon, improving all prior bounds. Our second contribution is more conceptual in nature. The problem of dynamic pricing sits at the intersection of two paradigms: adaptive learning in computer science / machine learning and revelation-principle-based mechanism design in economics-yet their relationship has remained unclear. We establish a fundamental equivalence: indirect learning mechanisms and direct revelation mechanisms achieve identical optimal regret. The adaptive, data-driven algorithms of online learning and explicit type elicitation are two languages towards solving the same problem; hence, learning is revelation in disguise.
翻译:我们研究动态定价问题,其中卖方与一个持有固定私有估值并折现未来效用的战略性非短视买方重复交互。现有工作仅聚焦于固定价格机制,该类机制只能提取二选一接受/拒绝信号。作为首个结果,我们证明菜单机制(提供分配-支付合约)能够实现$O(T_γ\log T_γ)$遗憾,其中$T_γ$为买方的有效折现时间跨度,该结果改进了所有现有界。第二个贡献更具概念性:动态定价问题处于两个范式——计算机科学/机器学习中的自适应学习与经济学中基于揭示原理的机制设计——的交汇点,但两者关系尚未明确。我们建立了一个基本等价性:间接学习机制与直接揭示机制实现相同的最优遗憾。在线学习的自适应数据驱动算法与显式类型获取是解决同一问题的两种语言;因此,学习即揭示。