We study a game between autobidding algorithms that compete in an online advertising platform. Each autobidder is tasked with maximizing its advertiser's total value over multiple rounds of a repeated auction, subject to budget and/or return-on-investment constraints. We propose a gradient-based learning algorithm that is guaranteed to satisfy all constraints and achieves vanishing individual regret. Our algorithm uses only bandit feedback and can be used with the first- or second-price auction, as well as with any "intermediate" auction format. Our main result is that when these autobidders play against each other, the resulting expected liquid welfare over all rounds is at least half of the expected optimal liquid welfare achieved by any allocation. This holds whether or not the bidding dynamics converges to an equilibrium and regardless of the correlation structure between advertiser valuations.
翻译:我们研究了在在线广告平台中竞争的自动出价算法之间的博弈。每个自动出价算法负责在重复拍卖的多轮次中最大化其广告主的总价值,同时受限于预算和/或投资回报率约束。我们提出了一种基于梯度的学习算法,该算法保证满足所有约束条件并实现个体遗憾趋于零。我们的算法仅使用强盗反馈,并可适用于第一价格或第二价格拍卖,以及任何“中间”拍卖格式。我们的主要结论是:当这些自动出价算法相互博弈时,所有轮次产生的期望流动性福利至少是实现任何分配的最优期望流动性福利的一半。无论出价动态是否收敛到均衡状态,也无论广告主估值之间的相关性结构如何,该结论均成立。