Persistent monitoring of dynamic targets is essential in real-world applications such as disaster response, environmental sensing, and wildlife conservation, where mobile agents must continuously gather information under uncertainty. We propose COMPASS, a multi-agent reinforcement learning (MARL) framework that enables decentralized agents to persistently monitor multiple moving targets efficiently. We model the environment as a graph, where nodes represent spatial locations and edges capture topological proximity, allowing agents to reason over structured layouts and revisit informative regions as needed. Each agent independently selects actions based on a shared spatio-temporal attention network that we design to integrate historical observations and spatial context. We model target dynamics using Gaussian Processes (GPs), which support principled belief updates and enable uncertainty-aware planning. We train COMPASS using centralized value estimation and decentralized policy execution under an adaptive reward setting. Our extensive experiments demonstrate that COMPASS consistently outperforms strong baselines in uncertainty reduction, target coverage, and coordination efficiency across dynamic multi-target scenarios.
翻译:动态目标的持续监测在灾害响应、环境感知和野生动物保护等现实应用中至关重要,这些场景中移动智能体必须在不确定性下持续收集信息。我们提出COMPASS,一种多智能体强化学习框架,使分散的智能体能够高效地持续监测多个移动目标。我们将环境建模为图结构,其中节点代表空间位置,边捕捉拓扑邻近关系,使智能体能够基于结构化布局进行推理,并根据需要重访信息丰富的区域。每个智能体基于我们设计的共享时空注意力网络独立选择动作,该网络整合历史观测与空间上下文信息。我们使用高斯过程对目标动态进行建模,该方法支持基于原则的信念更新并实现不确定性感知规划。我们在自适应奖励设置下,通过集中式价值估计与分散式策略执行来训练COMPASS。大量实验表明,在动态多目标场景中,COMPASS在不确定性降低、目标覆盖率和协同效率方面持续优于现有基线方法。