Learning in Budgeted Auctions with Spacing Objectives

In many repeated auction settings, participants care not only about how frequently they win but also how their winnings are distributed over time. This problem arises in various practical domains where avoiding congested demand is crucial, such as online retail sales and compute services, as well as in advertising campaigns that require sustained visibility over time. We introduce a simple model of this phenomenon, modeling it as a budgeted auction where the value of a win is a concave function of the time since the last win. This implies that for a given number of wins, even spacing over time is optimal. We also extend our model and results to the case when not all wins result in "conversions" (realization of actual gains), and the probability of conversion depends on a context. The goal is to maximize and evenly space conversions rather than just wins. We study the optimal policies for this setting in second-price auctions and offer learning algorithms for the bidders that achieve low regret against the optimal bidding policy in a Bayesian online setting. Our main result is a computationally efficient online learning algorithm that achieves $\tilde O(\sqrt T)$ regret. We achieve this by showing that an infinite-horizon Markov decision process (MDP) with the budget constraint in expectation is essentially equivalent to our problem, even when limiting that MDP to a very small number of states. The algorithm achieves low regret by learning a bidding policy that chooses bids as a function of the context and the system's state, which will be the time elapsed since the last win (or conversion). We show that state-independent strategies incur linear regret even without uncertainty of conversions. We complement this by showing that there are state-independent strategies that, while still having linear regret, achieve a $(1-\frac 1 e)$ approximation to the optimal reward.

翻译：在许多重复拍卖场景中，参与者不仅关注获胜频率，还关注获胜在时间上的分布模式。这一问题在多个实际领域至关重要，例如需要避免需求拥堵的在线零售销售与计算服务，以及要求持续时间曝光的广告活动。我们针对这一现象建立了一个简洁模型，将其建模为预算约束拍卖，其中获胜价值是距上次获胜时间的凹函数。这意味着对于给定数量的获胜，时间上的均匀间隔是最优的。我们还将模型与结果扩展到并非所有获胜都会产生"转化"（实际收益实现）的情形，其中转化概率依赖于上下文环境。此时目标转变为最大化并均匀间隔转化而非单纯获胜。我们在二价拍卖框架下研究该场景的最优策略，并为竞价者提供能在贝叶斯在线环境中实现相对于最优竞价策略低遗憾度的学习算法。我们的核心成果是一个计算高效的在线学习算法，其遗憾度达到$\tilde O(\sqrt T)$。通过证明具有期望预算约束的无限时域马尔可夫决策过程（MDP）与本问题本质等价，即使将该MDP限制在极少状态数时亦然，我们实现了这一目标。该算法通过学习将竞价策略构建为上下文与系统状态的函数来实现低遗憾度，其中系统状态定义为距上次获胜（或转化）的时间间隔。我们证明即使在没有转化不确定性的情况下，状态无关策略也会产生线性遗憾。作为补充，我们展示了存在某些状态无关策略虽仍具有线性遗憾，但能获得最优奖励的$(1-\frac 1 e)$近似解。