This work focuses on the persistent monitoring problem, where a set of targets moving based on an unknown model must be monitored by an autonomous mobile robot with a limited sensing range. To keep each target's position estimate as accurate as possible, the robot needs to adaptively plan its path to (re-)visit all the targets and update its belief from measurements collected along the way. In doing so, the main challenge is to strike a balance between exploitation, i.e., re-visiting previously-located targets, and exploration, i.e., finding new targets or re-acquiring lost ones. Encouraged by recent advances in deep reinforcement learning, we introduce an attention-based neural solution to the persistent monitoring problem, where the agent can learn the inter-dependencies between targets, i.e., their spatial and temporal correlations, conditioned on past measurements. This endows the agent with the ability to determine which target, time, and location to attend to across multiple scales, which we show also helps relax the usual limitations of a finite target set. We experimentally demonstrate that our method outperforms other baselines in terms of number of targets visits and average estimation error in complex environments. Finally, we implement and validate our model in a drone-based simulation experiment to monitor mobile ground targets in a high-fidelity simulator.
翻译:本研究聚焦于持续监测问题,其中一组基于未知模型运动的目标需由具备有限感知范围的自主移动机器人进行监测。为尽可能保持每个目标位置估计的准确性,机器人需要自适应地规划路径以(重新)访问所有目标,并根据沿途采集的测量值更新其置信度。在此过程中,主要挑战在于平衡开发(即重新访问已定位目标)与探索(即发现新目标或重新捕获丢失目标)之间的关系。受深度强化学习最新进展的启发,我们提出了一种基于注意力的神经解决方案用于持续监测问题,该方案使智能体能够学习目标间的相互依赖关系(即其时空相关性),并基于历史测量值进行条件建模。这使得智能体具备跨多尺度确定需关注的目标、时间和位置的能力,我们证明这也有助于缓解有限目标集合的常见限制。实验结果表明,在复杂环境中,我们的方法在目标访问次数和平均估计误差方面优于其他基准方法。最后,我们在高保真模拟器中实现了基于无人机的仿真实验,验证了所提方法在监测地面移动目标中的有效性。