Inverse Submodular Maximization with Application to Human-in-the-Loop Multi-Robot Multi-Objective Coverage Control

We consider a new type of inverse combinatorial optimization, Inverse Submodular Maximization (ISM), for human-in-the-loop multi-robot coordination. Forward combinatorial optimization, defined as the process of solving a combinatorial problem given the reward (cost)-related parameters, is widely used in multi-robot coordination. In the standard pipeline, the reward (cost)-related parameters are designed offline by domain experts first and then these parameters are utilized for coordinating robots online. What if we need to change these parameters by non-expert human supervisors who watch over the robots during tasks to adapt to some new requirements? We are interested in the case where human supervisors can suggest what actions to take, and the robots need to change the internal parameters based on such suggestions. We study such problems from the perspective of inverse combinatorial optimization, i.e., the process of finding parameters given solutions to the problem. Specifically, we propose a new formulation for ISM, in which we aim to find a new set of parameters that minimally deviate from the current parameters and can make the greedy algorithm output actions the same as those suggested by humans. We show that such problems can be formulated as a Mixed Integer Quadratic Program (MIQP). However, MIQP involves exponentially many binary variables, making it intractable for the existing solver when the problem size is large. We propose a new algorithm under the Branch $\&$ Bound paradigm to solve such problems. In numerical simulations, we demonstrate how to use ISM in multi-robot multi-objective coverage control, and we show that the proposed algorithm achieves significant advantages in running time and peak memory usage compared to directly using an existing solver.

翻译：我们考虑一种新型的逆向组合优化问题——逆子模最大化（Inverse Submodular Maximization, ISM），用于人机协同的多机器人协调。正向组合优化通常定义为：在给定与奖励（成本）相关参数的前提下，求解组合问题，广泛应用于多机器人协调。标准流程中，奖励（成本）相关参数首先由领域专家离线设计，随后用于在线协调机器人。然而，若需由监控任务进程的非专家人类主管根据新需求调整这些参数，应如何处理？我们关注以下场景：人类主管可建议机器人应执行哪些动作，而机器人需基于这些建议改变内部参数。我们从逆向组合优化的视角研究此类问题，即根据问题的解求解参数。具体而言，我们提出了一种新的ISM形式化方法，旨在寻找一组新参数，使其与当前参数的偏离最小，同时确保贪婪算法输出的动作与人类建议一致。我们证明，该问题可表述为混合整数二次规划（Mixed Integer Quadratic Program, MIQP）。然而，MIQP包含指数级数量的二元变量，导致现有求解器在处理大规模问题时难以求解。我们提出了一种基于分支定界（Branch & Bound）范式的新算法来解决该问题。在数值仿真中，我们展示了如何将ISM应用于多机器人多目标覆盖控制，并证明所提算法在运行时间和峰值内存使用上相较于直接使用现有求解器具有显著优势。