Intent-based Deep Reinforcement Learning for Multi-agent Informative Path Planning

In multi-agent informative path planning (MAIPP), agents must collectively construct a global belief map of an underlying distribution of interest (e.g., gas concentration, light intensity, or pollution levels) over a given domain, based on measurements taken along their trajectory. They must frequently replan their path to balance the distributed exploration of new areas and the collective, meticulous exploitation of known high-interest areas, to maximize the information gained within a predefined budget (e.g., path length or working time). A common approach to achieving such cooperation relies on planning the agents' paths reactively, conditioned on other agents' future actions. However, as the agent's belief is updated continuously, these predicted future actions may not end up being the ones executed by agents, introducing a form of noise/inaccuracy in the system and often decreasing performance. In this work, we propose a decentralized deep reinforcement learning (DRL) approach to MAIPP, which relies on an attention-based neural network, where agents optimize long-term individual and cooperative objectives by explicitly sharing their intent (i.e., medium-/long-term future positions distribution, obtained from their individual policy) in a reactive, asynchronous manner. That is, in our work, intent sharing allows agents to learn to claim/avoid broader areas of the world. Moreover, since our approach relies on learned attention over these shared intents, agents are able to learn to recognize the useful portion(s) of these (imperfect) predictions to maximize cooperation even in the presence of imperfect information. Our comparison experiments demonstrate the performance of our approach compared to its variants and high-quality baselines over a large set of MAIPP simulations.

翻译：在多智能体信息路径规划（MAIPP）中，智能体必须根据其轨迹沿途的测量结果，在给定域上共同构建感兴趣底层分布（如气体浓度、光照强度或污染水平）的全局信念图。它们需频繁重新规划路径，以平衡对新区域的分布式探索与对已知高价值区域的集体精细开发，从而在预定预算（如路径长度或工作时间）内最大化获取信息。实现这种协作的常用方法依赖于根据其他智能体的未来动作进行反应性路径规划。然而，由于智能体的信念持续更新，这些预测的未来动作可能最终并非智能体实际执行的行动，从而在系统中引入噪声/不准确性，并常导致性能下降。在本工作中，我们提出一种面向MAIPP的分散式深度强化学习（DRL）方法，该方法基于注意力机制神经网络，智能体通过以反应式异步方式明确共享其意图（即由其个体策略获得的长期未来位置分布），优化长期个体与协作目标。具体而言，在本工作中，意图共享使智能体能够学会争夺或规避更广阔的世界区域。此外，由于我们的方法依赖于对这些共享意图的学习注意力机制，即使存在不完美预测信息，智能体也能识别这些不完美预测中的有用部分，从而最大化协作效果。对比实验在大量MAIPP仿真数据集上展示了我们的方法相较于其变体及高质量基线模型的性能优势。