A recent stream of structured learning approaches has improved the practical state of the art for a range of combinatorial optimization problems with complex objectives encountered in operations research. Such approaches train policies that chain a statistical model with a surrogate combinatorial optimization oracle to map any instance of the problem to a feasible solution. The key idea is to exploit the statistical distribution over instances instead of dealing with instances separately. However learning such policies by risk minimization is challenging because the empirical risk is piecewise constant in the parameters, and few theoretical guarantees have been provided so far. In this article, we investigate methods that smooth the risk by perturbing the policy, which eases optimization and improves generalization. Our main contribution is a generalization bound that controls the perturbation bias, the statistical learning error, and the optimization error. Our analysis relies on the introduction of a uniform weak property, which captures and quantifies the interplay of the statistical model and the surrogate combinatorial optimization oracle. This property holds under mild assumptions on the statistical model, the surrogate optimization, and the instance data distribution. We illustrate the result on a range of applications such as stochastic vehicle scheduling. In particular, such policies are relevant for contextual stochastic optimization and our results cover this case.
翻译:近年来,一系列结构化学习方法显著提升了运筹学中具有复杂目标函数的组合优化问题的实际技术水平。这类方法通过训练策略,将统计模型与替代组合优化求解器串联,从而将任意问题实例映射为可行解。其核心思想在于利用实例的统计分布特性,而非单独处理每个实例。然而,通过风险最小化学习此类策略面临挑战,因为经验风险在参数空间中呈分段常数特性,且目前鲜有理论保证。本文研究通过扰动策略来平滑风险的方法,这有助于优化过程并改善泛化性能。我们的主要贡献是提出一个泛化界,该界能够控制扰动偏差、统计学习误差以及优化误差。我们的分析依赖于引入一致弱性质,该性质刻画并量化了统计模型与替代组合优化求解器之间的相互作用。在统计模型、替代优化器及实例数据分布满足温和假设的条件下,该性质成立。我们在随机车辆调度等一系列应用场景中验证了该结果。特别地,此类策略与上下文随机优化问题高度相关,而我们的研究结果也涵盖了这一情形。