In modern advertising platforms, learning algorithms are deployed by budget-constrained bidders to maximize their accumulated value. These algorithms often offer classical utility guarantees like no-regret, i.e., the agent's utility is at least the utility achieved by some benchmark in which it is assumed that every other agent's bidding remains the same. These guarantees offer compelling properties: They are optimal against stationary competition distributions, and in unconstrained settings, the resulting empirical distribution of play induced by no-regret dynamics approximates a Coarse Correlated Equilibrium. However, no-regret algorithms are easily manipulable, and in budgeted settings, no stronger notion of regret (such as swap regret) is currently known that would limit such manipulation. We propose a very simple learning algorithm for budgeted sequential auctions where agents maximize their total number of wins and show that it has surprisingly appealing properties. We analyze this algorithm from two perspectives. First, we show that when an agent with a $ρ$ fraction of the total budget uses this algorithm, then she is guaranteed to win at least $ρT - O(\sqrt T)$ of the total $T$ rounds. This result holds for adversarial behavior by the other agents, as long as they respect their own budget restrictions. Second, we examine the scenario when all the agents follow our algorithm. By the first result, every agent's total wins are proportional to her budget, up to the additive $O(\sqrt T)$ term. In addition, we show that this result holds in a much stronger sense: after an initial period of $O(\sqrt T \log T)$ rounds, every agent gets the same guarantee over any time interval. For intervals of length $O(\sqrt T)$, we show that the deviation from the desired number of wins is an additive constant.
翻译:在现代广告平台中,预算受限的竞标者通过部署学习算法来最大化其累积价值。这些算法通常提供经典的效用保证,如无遗憾性,即智能体的效用至少能达到某个基准下的效用水平(该基准假设其他所有智能体的出价行为保持不变)。这类保证具有引人注目的特性:它们在平稳竞争分布下是最优的,并且在无约束场景中,由无遗憾动态诱导的博弈经验分布近似于粗相关均衡。然而,无遗憾算法极易受到操纵,且在预算约束场景中,目前尚未发现能限制此类操纵的更强遗憾概念(如交换遗憾)。我们针对以最大化总获胜次数为目标的预算约束序列拍卖,提出了一种极其简单的学习算法,并证明其具有出人意料的优越特性。我们从两个视角分析该算法:首先,我们证明当拥有总预算$ρ$比例的智能体使用此算法时,她保证能在总$T$轮次中至少获胜$ρT - O(\sqrt T)$次。该结果对其他智能体的对抗性行为成立,只要其遵守自身的预算约束。其次,我们考察所有智能体均遵循本算法的场景。根据第一项结果,每个智能体的总获胜次数与其预算成比例,误差不超过加性项$O(\sqrt T)$。此外,我们证明该结论在更强意义上成立:经过$O(\sqrt T \log T)$轮的初始阶段后,每个智能体在任意时间区间内都能获得相同的保证。对于长度为$O(\sqrt T)$的区间,我们证明实际获胜次数与期望次数的偏差仅为加性常数。