We consider a dynamic pricing problem for repeated contextual second-price auctions with multiple strategic buyers who aim to maximize their long-term time discounted utility. The seller has limited information on buyers' overall demand curves which depends on a non-parametric market-noise distribution, and buyers may potentially submit corrupted bids (relative to true valuations) to manipulate the seller's pricing policy for more favorable reserve prices in the future. We focus on designing the seller's learning policy to set contextual reserve prices where the seller's goal is to minimize regret compared to the revenue of a benchmark clairvoyant policy that has full information of buyers' demand. We propose a policy with a phased-structure that incorporates randomized "isolation" periods, during which a buyer is randomly chosen to solely participate in the auction. We show that this design allows the seller to control the number of periods in which buyers significantly corrupt their bids. We then prove that our policy enjoys a $T$-period regret of $\widetilde{\mathcal{O}}(\sqrt{T})$ facing strategic buyers. Finally, we conduct numerical simulations to compare our proposed algorithm to standard pricing policies. Our numerical results show that our algorithm outperforms these policies under various buyer bidding behavior.
翻译:我们考虑了一种针对重复性上下文第二价格拍卖的动态定价问题,其中存在多个追求长期时间贴现效用最大化的策略性买家。卖家的信息有限,仅了解买家的整体需求曲线,该需求取决于非参数市场噪声分布,且买家可能提交被操纵的出价(相对于真实估值),以影响卖家的定价策略,从而在未来获得更有利的保留价格。我们专注于设计卖家的学习策略,以设定上下文保留价格,卖家的目标是最小化与基准先知策略(该策略完全掌握买家需求信息)收益相比的遗憾。我们提出了一种分阶段结构的策略,该策略融入了随机“隔离”期,在此期间随机选择一名买家独自参与拍卖。我们证明,这种设计使卖家能够控制买家显著操纵出价的时期数量。进而,我们证明了所提策略在面对策略性买家时,其$T$期遗憾为$\widetilde{\mathcal{O}}(\sqrt{T})$。最后,我们通过数值模拟将所提算法与标准定价策略进行了比较。数值结果表明,在各种买家出价行为下,我们的算法均优于这些策略。