Using reinforcement learning to autonomously identify sources of error for agents in group missions

When agents swarm to execute a mission, some of them frequently exhibit sudden failure, as observed from the command base. It is generally difficult to determine whether a failure is caused by actuators (hypothesis, $h_a$) or sensors (hypothesis, $h_s$) by solely relying on the communication between the command base and concerning agent. However, by instigating collusion between the agents, the cause of failure can be identified; in other words, we expect to detect corresponding displacements for $h_a$ but not for $h_s$. In this study, we considered the question as to whether artificial intelligence can autonomously generate an action plan $\boldsymbol{g}$ to pinpoint the cause as aforedescribed. Because the expected response to $\boldsymbol{g}$ generally depends upon the adopted hypothesis [let the difference be denoted by $D(\boldsymbol{g})$], a formulation that uses $D\left(\boldsymbol{g}\right)$ to pinpoint the cause can be made. Although a $\boldsymbol{g}^*$ that maximizes $D(\boldsymbol{g})$ would be a suitable action plan for this task, such an optimization is difficult to achieve using the conventional gradient method, as $D(\boldsymbol{g})$ becomes nonzero in rare events such as collisions with other agents, and most swarm actions $\boldsymbol{g}$ give $D(\boldsymbol{g})=0$. In other words, throughout almost the entire space of $\boldsymbol{g}$, $D(\boldsymbol{g})$ has zero gradient, and the gradient method is not applicable. To overcome this problem, we formulated an action plan using Q-table reinforcement learning. Surprisingly, the optimal action plan generated via reinforcement learning presented a human-like solution to pinpoint the problem by colliding other agents with the failed agent. Using this simple prototype, we demonstrated the potential of applying Q-table reinforcement learning methods to plan autonomous actions to pinpoint the causes of failure.

翻译：当智能体群体执行任务时，从指挥基地观察到其中部分智能体常会突然出现故障。仅依靠指挥基地与相关智能体之间的通信，通常难以判断故障是由执行器（假设$h_a$）还是传感器（假设$h_s$）引起的。然而，通过引发智能体间的协同配合，可以识别故障原因；换言之，对于$h_a$，我们预期能检测到相应的位移，而对于$h_s$则不会。本研究探讨了人工智能能否自主生成行动计划$\boldsymbol{g}$以按上述方式定位故障原因。由于对$\boldsymbol{g}$的预期响应通常取决于所采用的假设（设其差异为$D(\boldsymbol{g})$），因此可以构建利用$D(\boldsymbol{g})$定位故障原因的数学表达。虽然最大化$D(\boldsymbol{g})$的$\boldsymbol{g}^*$是适合此任务的行动计划，但此类优化难以通过传统梯度方法实现，因为$D(\boldsymbol{g})$仅在罕见事件（如与其他智能体碰撞）中非零，且大多数群体行动$\boldsymbol{g}$都导致$D(\boldsymbol{g})=0$。换言之，在$\boldsymbol{g}$的几乎整个空间内，$D(\boldsymbol{g})$的梯度为零，梯度方法不适用。为解决此问题，我们使用Q表强化学习制定了行动计划。令人惊讶的是，通过强化学习生成的最优行动计划展现出了类人解决方案——通过让其他智能体与故障智能体碰撞来定位问题。通过这一简单原型，我们展示了应用Q表强化学习方法规划自主行动以定位故障原因的潜力。