Online Learning via Offline Greedy Algorithms: Applications in Market Design and Optimization

Motivated by online decision-making in time-varying combinatorial environments, we study the problem of transforming offline algorithms to their online counterparts. We focus on offline combinatorial problems that are amenable to a constant factor approximation using a greedy algorithm that is robust to local errors. For such problems, we provide a general framework that efficiently transforms offline robust greedy algorithms to online ones using Blackwell approachability. We show that the resulting online algorithms have $O(\sqrt{T})$ (approximate) regret under the full information setting. We further introduce a bandit extension of Blackwell approachability that we call Bandit Blackwell approachability. We leverage this notion to transform greedy robust offline algorithms into a $O(T^{2/3})$ (approximate) regret in the bandit setting. Demonstrating the flexibility of our framework, we apply our offline-to-online transformation to several problems at the intersection of revenue management, market design, and online optimization, including product ranking optimization in online platforms, reserve price optimization in auctions, and submodular maximization. We also extend our reduction to greedy-like first order methods used in continuous optimization, such as those used for maximizing continuous strong DR monotone submodular functions subject to convex constraints. We show that our transformation, when applied to these applications, leads to new regret bounds or improves the current known bounds. We complement our theoretical studies by conducting numerical simulations for two of our applications, in both of which we observe that the numerical performance of our transformations outperforms the theoretical guarantees in practical instances.

翻译：受时变组合环境中在线决策的启发，我们研究了将离线算法转化为在线算法的问题。我们聚焦于能够通过具有局部错误鲁棒性的贪心算法实现常数因子近似的离线组合问题。针对此类问题，我们提出了一个通用框架，利用Blackwell可逼近性将离线鲁棒贪心算法高效转化为在线算法。研究证明，在完全信息设定下，生成的在线算法可实现$O(\sqrt{T})$（近似）遗憾值。我们进一步提出了Blackwell可逼近性的赌博机扩展，称之为Bandit Blackwell可逼近性。利用这一概念，我们将贪心鲁棒离线算法转化为在赌博机设定下具有$O(T^{2/3})$（近似）遗憾值的在线算法。为展示该框架的灵活性，我们将离线-在线转化方法应用于收益管理、市场设计与在线优化交叉领域的多个问题，包括在线平台的产品排序优化、拍卖中的保留价优化以及子模最大化问题。我们还将该归约方法扩展到连续优化中使用的类贪心一阶方法，例如用于在凸约束下最大化连续强DR单调子模函数的方法。研究表明，当我们的转化方法应用于这些场景时，能够产生新的遗憾界或改进现有已知界。我们通过两个应用实例的数值模拟对理论分析进行补充，结果表明在实际算例中，转化方法的数值表现优于理论保证。