We study online learning problems in which a decision maker has to make a sequence of costly decisions, with the goal of maximizing their expected reward while adhering to budget and return-on-investment (ROI) constraints. Existing primal-dual algorithms designed for constrained online learning problems under adversarial inputs rely on two fundamental assumptions. First, the decision maker must know beforehand the value of parameters related to the degree of strict feasibility of the problem (i.e. Slater parameters). Second, a strictly feasible solution to the offline optimization problem must exist at each round. Both requirements are unrealistic for practical applications such as bidding in online ad auctions. In this paper, we show how such assumptions can be circumvented by endowing standard primal-dual templates with weakly adaptive regret minimizers. This results in a ``dual-balancing'' framework which ensures that dual variables stay sufficiently small, even in the absence of knowledge about Slater's parameter. We prove the first best-of-both-worlds no-regret guarantees which hold in absence of the two aforementioned assumptions, under stochastic and adversarial inputs. Finally, we show how to instantiate the framework to optimally bid in various mechanisms of practical relevance, such as first- and second-price auctions.
翻译:摘要:我们研究在线学习问题,其中决策者需做出序列化有成本决策,目标是在遵守预算与投资回报率(ROI)约束的同时最大化期望收益。针对对抗性输入下约束在线学习问题的现有原始-对偶算法依赖两个基本假设:首先,决策者必须预先知晓与问题严格可行程度相关的参数值(即Slater参数);其次,每一轮必须存在离线优化问题的严格可行解。这两个假设对于在线广告竞价等实际应用而言并不现实。本文证明,通过将标准原始-对偶框架与弱自适应遗憾最小化器相结合,可以规避此类假设。由此产生的"对偶平衡"框架能够确保对偶变量保持足够小,即使缺乏Slater参数知识也如此。我们首次证明了在缺乏上述两个假设的条件下,针对随机与对抗性输入均适用的最优两世界无遗憾保证。最后,我们展示了如何将该框架实例化,以在第一价格、第二价格拍卖等实际相关机制中进行最优竞价。