We present a differentiable, decision-oriented learning framework for cost prediction in a class of multi-robot decision-making problems, in which the robots need to trade off the task performance with the costs of taking actions when they select actions to take. Specifically, we consider the cases where the task performance is measured by a known monotone submodular function (e.g., coverage, mutual information), and the cost of actions depends on the context (e.g., wind and terrain conditions). We need to learn a function that maps the context to the costs. Classically, we treat such a learning problem and the downstream decision-making problem as two decoupled problems, i.e., we first learn to predict the cost function without considering the downstream decision-making problem, and then use the learned function for predicting the cost and using it in the decision-making problem. However, the loss function used in learning a prediction function may not be aligned with the downstream decision-making. We propose a decision-oriented learning framework that incorporates the downstream task performance in the prediction phase via a differentiable optimization layer. The main computational challenge in such a framework is to make the combinatorial optimization, i.e., non-monotone submodular maximization, differentiable. This function is not naturally differentiable. We propose the Differentiable Cost Scaled Greedy algorithm (D-CSG), which is a continuous and differentiable relaxation of CSG. We demonstrate the efficacy of the proposed framework through numerical simulations. The results show that the proposed framework can result in better performance than the traditional two-stage approach.
翻译:我们提出了一种面向决策的差异化学习框架,用于一类多机器人决策问题中的成本预测。在这类问题中,机器人选择行动时需权衡任务性能与行动成本。具体而言,我们考虑任务性能由已知的单调子模函数(如覆盖范围、互信息)衡量,而行动成本依赖于环境背景(如风力和地形条件)的情景。我们需要学习一个将环境背景映射到成本的函数。传统方法将学习问题与下游决策问题视为两个解耦问题:首先在不考虑下游决策的情况下学习成本预测函数,然后将其用于成本预测和决策。然而,学习损失函数可能与下游决策目标不一致。我们提出一种面向决策的学习框架,通过可微优化层将下游任务性能融入预测阶段。该框架的主要计算挑战在于使组合优化——即非单调子模最大化——具有可微性,而该函数本身不可微。为此,我们提出可微成本缩放贪婪算法(D-CSG),它是CSG算法的连续可微松弛形式。通过数值仿真验证了所提框架的有效性,结果表明其性能优于传统两步方法。