Decision-focused learning (DFL) is an increasingly popular paradigm for training predictive models whose outputs are used in decision-making tasks. Instead of merely optimizing for predictive accuracy, DFL trains models to directly minimize the loss associated with downstream decisions. However, existing studies focus solely on scenarios where a fixed batch of data is available and the objective function does not change over time. We instead investigate DFL in dynamic environments where the objective function and data distribution evolve over time. This setting is challenging for online learning because the objective function has zero or undefined gradients, which prevents the use of standard first-order optimization methods, and is generally non-convex. To address these difficulties, we (i) regularize the objective to make it differentiable and (ii) use perturbation techniques along with a near-optimal oracle to overcome non-convexity. Combining those techniques yields two original online algorithms tailored for DFL, for which we establish respectively static and dynamic regret bounds. These are the first provable guarantees for the online decision-focused problem. Finally, we showcase the effectiveness of our algorithms on a knapsack experiment, where they outperform two standard benchmarks.
翻译:决策聚焦学习(DFL)是一种日益流行的范式,用于训练预测模型,其输出被用于决策任务。DFL并非仅仅优化预测准确性,而是直接训练模型以最小化下游决策相关的损失。然而,现有研究仅关注固定批次数据可用且目标函数不随时间变化的场景。我们转而研究动态环境中的DFL,其中目标函数和数据分布随时间演变。这一设置对在线学习具有挑战性,因为目标函数具有零梯度或未定义梯度,这阻碍了标准一阶优化方法的使用,并且通常是非凸的。为解决这些困难,我们(i)对目标函数进行正则化以使其可微,以及(ii)利用扰动技术结合一个近似最优的预言机来克服非凸性。结合这些技术产生了两种专为DFL设计的原创在线算法,我们分别为它们建立了静态和动态遗憾界。这是在线决策聚焦问题首次获得可证明的保证。最后,我们在一个背包实验上展示了我们算法的有效性,它们优于两个标准基准。