We study a search and tracking (S&T) problem for a team of dynamic search agents to capture an adversarial evasive agent with only sparse temporal and spatial knowledge of its location in this paper. The domain is challenging for traditional Reinforcement Learning (RL) approaches as the large space leads to sparse observations of the adversary and in turn sparse rewards for the search agents. Additionally, the opponent's behavior is reactionary to the search agents, which causes a data distribution shift for RL during training as search agents improve their policies. We propose a differentiable Multi-Agent RL (MARL) architecture that utilizes a novel filtering module to supplement estimated adversary location information and enables the effective learning of a team policy. Our algorithm learns how to balance information from prior knowledge and a motion model to remain resilient to the data distribution shift and outperforms all baseline methods with a 46% increase of detection rate.
翻译:本文研究了一组动态搜索智能体在仅拥有稀疏时空位置信息的情况下,捕获对抗性逃避智能体的搜索与追踪问题。该领域对传统强化学习方法具有挑战性:大搜索空间导致对对手的观测稀疏,进而使搜索智能体获得稀疏奖励。此外,对手行为会针对搜索智能体做出反应,导致训练过程中因搜索智能体策略改进而产生数据分布偏移。我们提出了一种可微分的多智能体强化学习架构,该架构利用新型滤波模块补充估计的对手位置信息,实现团队策略的有效学习。该算法能够学习如何平衡先验知识与运动模型的信息,以保持对数据分布偏移的鲁棒性,相比所有基线方法检测率提升46%。