Dynamic Pricing and Learning with Bayesian Persuasion

We consider a novel dynamic pricing and learning setting where in addition to setting prices of products in sequential rounds, the seller also ex-ante commits to 'advertising schemes'. That is, in the beginning of each round the seller can decide what kind of signal they will provide to the buyer about the product's quality upon realization. Using the popular Bayesian persuasion framework to model the effect of these signals on the buyers' valuation and purchase responses, we formulate the problem of finding an optimal design of the advertising scheme along with a pricing scheme that maximizes the seller's expected revenue. Without any apriori knowledge of the buyers' demand function, our goal is to design an online algorithm that can use past purchase responses to adaptively learn the optimal pricing and advertising strategy. We study the regret of the algorithm when compared to the optimal clairvoyant price and advertising scheme. Our main result is a computationally efficient online algorithm that achieves an $O(T^{2/3}(m\log T)^{1/3})$ regret bound when the valuation function is linear in the product quality. Here $m$ is the cardinality of the discrete product quality domain and $T$ is the time horizon. This result requires some natural monotonicity and Lipschitz assumptions on the valuation function, but no Lipschitz or smoothness assumption on the buyers' demand function. For constant $m$, our result matches the regret lower bound for dynamic pricing within logarithmic factors, which is a special case of our problem. We also obtain several improved results for the widely considered special case of additive valuations, including an $\tilde{O}(T^{2/3})$ regret bound independent of $m$ when $m\le T^{1/3}$.

翻译：本文研究一种新颖的动态定价与学习场景，其中卖方除了在序贯轮次中设定产品价格外，还需事先承诺"广告方案"。即每轮开始时，卖方可以决定在产品质量实现后向买方提供何种信号。我们采用流行的贝叶斯说服框架来建模这些信号对买方估值及购买反应的影响，进而提出一个寻找最优广告方案设计与定价方案组合的问题，以最大化卖方期望收益。在无需任何买方需求函数先验知识的前提下，我们的目标是设计一种在线算法，能够利用历史购买反应自适应地学习最优定价与广告策略。我们研究了该算法相较于最优先知价格与广告方案的遗憾值。主要成果是提出一种计算高效的在线算法，当估值函数与产品质量呈线性关系时，该算法可实现$O(T^{2/3}(m\log T)^{1/3})$的遗憾界。其中$m$为离散产品质量域基数，$T$为时间跨度。该结果需假设估值函数满足自然单调性与Lipschitz条件，但无需对买方需求函数做Lipschitz或平滑性假设。当$m$为常数时，我们的结果在动态定价的遗憾下界基础上仅相差对数因子——而动态定价正是本问题的特例。针对广泛研究的加性估值特例，我们获得了若干改进结果，包括当$m\le T^{1/3}$时独立于$m$的$\tilde{O}(T^{2/3})$遗憾界。