Decision-Oriented Learning with Differentiable Submodular Maximization for Vehicle Routing Problem

We study the problem of learning a function that maps context observations (input) to parameters of a submodular function (output). Our motivating case study is a specific type of vehicle routing problem, in which a team of Unmanned Ground Vehicles (UGVs) can serve as mobile charging stations to recharge a team of Unmanned Ground Vehicles (UAVs) that execute persistent monitoring tasks. {We want to learn the mapping from observations of UAV task routes and wind field to the parameters of a submodular objective function, which describes the distribution of landing positions of the UAVs .} Traditionally, such a learning problem is solved independently as a prediction phase without considering the downstream task optimization phase. However, the loss function used in prediction may be misaligned with our final goal, i.e., a good routing decision. Good performance in the isolated prediction phase does not necessarily lead to good decisions in the downstream routing task. In this paper, we propose a framework that incorporates task optimization as a differentiable layer in the prediction phase. Our framework allows end-to-end training of the prediction model without using engineered intermediate loss that is targeted only at the prediction performance. In the proposed framework, task optimization (submodular maximization) is made differentiable by introducing stochastic perturbations into deterministic algorithms (i.e., stochastic smoothing). We demonstrate the efficacy of the proposed framework using synthetic data. Experimental results of the mobile charging station routing problem show that the proposed framework can result in better routing decisions, e.g. the average number of UAVs recharged increases, compared to the prediction-optimization separate approach.

翻译：我们研究学习一个函数的问题，该函数将上下文观测（输入）映射到子模函数的参数（输出）。我们的激励案例研究是特定类型的车辆路径问题，其中一组无人地面车辆（UGVs）可作为移动充电站，为执行持续监测任务的无人空中飞行器（UAVs）团队充电。我们旨在学习从UAV任务路线和风场观测到子模目标函数参数的映射，该函数描述UAV着陆位置的分布。传统上，此类学习问题作为独立预测阶段求解，未考虑下游任务优化阶段。然而，预测中使用的损失函数可能与最终目标（即良好的路由决策）不一致。孤立预测阶段的良好性能并不必然导致下游路由任务中的优良决策。本文提出一种框架，将任务优化作为可微分层集成到预测阶段。该框架无需为仅针对预测性能而设计的工程化中间损失，即可实现预测模型的端到端训练。在提出的框架中，任务优化（子模最大化）通过向确定性算法引入随机扰动（即随机平滑）而变得可微。我们使用合成数据验证了所提框架的有效性。移动充电站路由问题的实验结果表明，与预测-优化分离方法相比，所提框架可带来更优的路由决策，例如重新充电的UAV平均数量增加。