We study online learning problems in which a decision maker wants to maximize their expected reward without violating a finite set of $m$ resource constraints. By casting the learning process over a suitably defined space of strategy mixtures, we recover strong duality on a Lagrangian relaxation of the underlying optimization problem, even for general settings with non-convex reward and resource-consumption functions. Then, we provide the first best-of-many-worlds type framework for this setting, with no-regret guarantees under stochastic, adversarial, and non-stationary inputs. Our framework yields the same regret guarantees of prior work in the stochastic case. On the other hand, when budgets grow at least linearly in the time horizon, it allows us to provide a constant competitive ratio in the adversarial case, which improves over the best known upper bound bound of $O(\log m \log T)$. Moreover, our framework allows the decision maker to handle non-convex reward and cost functions. We provide two game-theoretic applications of our framework to give further evidence of its flexibility. In doing so, we show that it can be employed to implement budget-pacing mechanisms in repeated first-price auctions.
翻译:我们研究决策者在满足有限个$m$资源约束的同时最大化期望收益的在线学习问题。通过将学习过程投影到定义合适的策略混合空间上,我们恢复了底层优化问题的拉格朗日松弛强对偶性,即使面对非凸收益和资源消耗函数的一般设置也能成立。接着,我们首次为此类问题提出一个"多情境最优"类型框架,在随机、对抗性和非平稳输入下均具有无遗憾保证。该框架在随机场景下可达到与先前工作相同的遗憾界。另一方面,当预算至少随时间线性增长时,该框架在对抗性场景下能提供常数竞争比,这优于已知最优上界$O(\log m \log T)$。此外,该框架允许决策者处理非凸收益和成本函数。我们提供了两个博弈论应用实例,进一步证明该框架的灵活性。通过实例表明,该框架可用于在重复第一价格拍卖中实现预算节奏机制。