We study a game between autobidding algorithms that compete in an online advertising platform. Each autobidder is tasked with maximizing its advertiser's total value over multiple rounds of a repeated auction, subject to budget and/or return-on-investment constraints. We propose a gradient-based learning algorithm that is guaranteed to satisfy all constraints and achieves vanishing individual regret. Our algorithm uses only bandit feedback and can be used with the first- or second-price auction, as well as with any "intermediate" auction format. Our main result is that when these autobidders play against each other, the resulting expected liquid welfare over all rounds is at least half of the expected optimal liquid welfare achieved by any allocation. This holds whether or not the bidding dynamics converges to an equilibrium and regardless of the correlation structure between advertiser valuations.
翻译:我们研究在在线广告平台中竞争的自动竞价算法之间的博弈。每个自动竞价者需在重复拍卖的多轮次中,在预算和/或投资回报率约束下,最大化其广告主的总价值。我们提出一种基于梯度的学习算法,该算法保证满足所有约束条件,并实现渐近消失的个体遗憾值。该算法仅使用臂架反馈,适用于第一价格拍卖、第二价格拍卖以及任何"中间"拍卖格式。我们的主要结论是:当这些自动竞价者相互博弈时,所有轮次产生的预期流动福利至少是最优分配所能实现的预期流动福利的一半。这一结论无论竞价动态是否收敛至均衡,也无论广告主估值之间的相关结构如何,均成立。