We study online learning problems in which a decision maker has to make a sequence of costly decisions, with the goal of maximizing their expected reward while adhering to budget and return-on-investment (ROI) constraints. Previous work requires the decision maker to know beforehand some specific parameters related to the degree of strict feasibility of the offline problem. Moreover, when inputs are adversarial, it requires the existence of a strictly feasible solution to the offline optimization problem at each round. Both requirements are unrealistic for practical applications such as bidding in online ad auctions. We propose a best-of-both-worlds primal-dual framework which circumvents both assumptions by exploiting the notion of interval regret, providing guarantees under both stochastic and adversarial inputs. Our proof techniques can be applied to both input models with minimal modifications, thereby providing a unified perspective on the two problems. Finally, we show how to instantiate the framework to optimally bid in various mechanisms of practical relevance, such as first- and second-price auctions.
翻译:我们研究了决策者需做出序列化有成本决策的在线学习问题,目标是在遵守预算和投资回报率约束的同时最大化期望收益。以往研究要求决策者事先知晓与离线问题严格可行程度相关的特定参数。此外,当输入对抗性时,还要求每轮存在离线优化问题的严格可行解。这些假设对于在线广告拍卖竞价等实际应用而言均不现实。我们提出一种兼具两全优势的原始-对偶框架,通过利用区间遗憾概念规避上述两种假设,在随机输入和对抗输入下均能提供保证。我们的证明技术可通过最小修改应用于两类输入模型,从而为这两个问题提供统一视角。最后,我们展示了如何将该框架实例化,以在第一价格拍卖和第二价格拍卖等具有实际意义的机制中实现最优竞价。