Leveraging Reviews: Learning to Price with Buyer and Seller Uncertainty

In online marketplaces, customers have access to hundreds of reviews for a single product. Buyers often use reviews from other customers that share their type -- such as height for clothing, skin type for skincare products, and location for outdoor furniture -- to estimate their values, which they may not know a priori. Customers with few relevant reviews may hesitate to make a purchase except at a low price, so for the seller, there is a tension between setting high prices and ensuring that there are enough reviews so that buyers can confidently estimate their values. Simultaneously, sellers may use reviews to gauge the demand for items they wish to sell. In this work, we study this pricing problem in an online setting where the seller interacts with a set of buyers of finitely-many types, one-by-one, over a series of $T$ rounds. At each round, the seller first sets a price. Then a buyer arrives and examines the reviews of the previous buyers with the same type, which reveal those buyers' ex-post values. Based on the reviews, the buyer decides to purchase if they have good reason to believe that their ex-ante utility is positive. Crucially, the seller does not know the buyer's type when setting the price, nor even the distribution over types. We provide a no-regret algorithm that the seller can use to obtain high revenue. When there are $d$ types, after $T$ rounds, our algorithm achieves a problem-independent $\tilde O(T^{2/3}d^{1/3})$ regret bound. However, when the smallest probability $q_{\text{min}}$ that any given type appears is large, specifically when $q_{\text{min}} \in \Omega(d^{-2/3}T^{-1/3})$, then the same algorithm achieves a $\tilde O(T^{1/2}q_{\text{min}}^{-1/2})$ regret bound. We complement these upper bounds with matching lower bounds in both regimes, showing that our algorithm is minimax optimal up to lower order terms.

翻译：在在线市场中，顾客可获取单个产品的数百条评论。买家通常借助同类用户的评论（如服装的尺码、护肤品的肤质、户外家具的地点）来预估自己先验未知的价值。因缺乏相关评论而难以估值的顾客，除非价格低廉否则可能犹豫购买——因此卖家面临权衡：既要设定高价，又要确保有足够评论让买家能自信评估价值。与此同时，卖家也可能通过评论判断待售商品的需求。本文研究在线场景下的定价问题：卖家在T轮中逐一与有限个类型的买家交互。每轮伊始，卖家先设定价格，随后买家抵达并查看同类历史买家的评论（揭示其事后价值），基于评论决定是否在事后效用为正时购买。关键挑战在于，卖家设定价格时既不知道买家类型，也不了解类型分布。我们提出一套无遗憾算法助卖家获取高收益。当存在d种类型时，经T轮后算法获得问题无关的$\tilde O(T^{2/3}d^{1/3})$遗憾界。然而当最小类型出现概率$q_{\text{min}}$较大（即$q_{\text{min}} \in \Omega(d^{-2/3}T^{-1/3})$）时，同一算法达成$\tilde O(T^{1/2}q_{\text{min}}^{-1/2})$遗憾界。我们通过两种情形下匹配的下界验证了算法在低阶项意义下的极小化最优性。