This paper aims to solve the coordination of a team of robots traversing a route in the presence of adversaries with random positions. Our goal is to minimize the overall cost of the team, which is determined by (i) the accumulated risk when robots stay in adversary-impacted zones and (ii) the mission completion time. During traversal, robots can reduce their speed and act as a `guard' (the slower, the better), which will decrease the risks certain adversary incurs. This leads to a trade-off between the robots' guarding behaviors and their travel speeds. The formulated problem is highly non-convex and cannot be efficiently solved by existing algorithms. Our approach includes a theoretical analysis of the robots' behaviors for the single-adversary case. As the scale of the problem expands, solving the optimal solution using optimization approaches is challenging, therefore, we employ reinforcement learning techniques by developing new encoding and policy-generating methods. Simulations demonstrate that our learning methods can efficiently produce team coordination behaviors. We discuss the reasoning behind these behaviors and explain why they reduce the overall team cost.
翻译:本文旨在解决一组机器人在存在随机位置对抗者的情况下沿路线行进时的协调问题。我们的目标是最小化团队总成本,该成本由以下因素决定:(i) 机器人在对抗者影响区域内停留时累积的风险,以及(ii) 任务完成时间。在行进过程中,机器人可以降低速度并充当“守卫”(速度越慢,效果越好),这将降低特定对抗者带来的风险。这导致了机器人守卫行为与其行进速度之间的权衡。所构建的问题具有高度非凸性,无法通过现有算法高效求解。我们的方法包括对单对抗者情况下机器人行为的理论分析。随着问题规模的扩大,使用优化方法求解最优解具有挑战性,因此我们通过开发新的编码和策略生成方法,采用强化学习技术。仿真结果表明,我们的学习方法能够高效生成团队协调行为。我们讨论了这些行为背后的原理,并解释了它们为何能降低团队总成本。