In multi-agent informative path planning (MAIPP), agents must collectively construct a global belief map of an underlying distribution of interest (e.g., gas concentration, light intensity, or pollution levels) over a given domain, based on measurements taken along their trajectory. They must frequently replan their path to balance the exploration of new areas with the exploitation of known high-interest areas, to maximize information gain within a predefined budget. Traditional approaches rely on reactive path planning conditioned on other agents' predicted future actions. However, as the belief is continuously updated, the predicted actions may not match the executed actions, introducing noise and reducing performance. We propose a decentralized, deep reinforcement learning (DRL) approach using an attention-based neural network, where agents optimize long-term individual and cooperative objectives by sharing their intent, represented as a distribution of medium-/long-term future positions obtained from their own policy. Intent sharing enables agents to learn to claim or avoid broader areas, while the use of attention mechanisms allows them to identify useful portions of imperfect predictions, maximizing cooperation even based on imperfect information. Our experiments compare the performance of our approach, its variants, and high-quality baselines across various MAIPP scenarios. We finally demonstrate the effectiveness of our approach under limited communication ranges, towards deployments under realistic communication constraints.
翻译:在多智能体信息丰富路径规划(MAIPP)中,智能体需根据其轨迹上的测量值,在给定领域内共同构建关于感兴趣潜在分布(如气体浓度、光照强度或污染水平)的全局信念图。它们必须频繁地重新规划路径,以平衡对新区域的探索与对已知高价值区域的利用,从而在预设预算内最大化信息增益。传统方法通常依赖基于其他智能体预测未来动作的条件式反应性路径规划。然而,随着信念的持续更新,预测动作可能与实际执行动作不匹配,引入噪声并降低性能。我们提出一种基于注意力神经网络的平均强化学习(DRL)方法,智能体通过共享从其自身策略中获得的未来中/长期位置分布(即意图),来优化长期个体与协作目标。意图共享使智能体能够学习占领或避开更广泛的区域,而注意力机制则允许它们识别不完美预测中的有用部分,从而在不完美信息基础上实现最大程度的协作。我们在多种MAIPP场景下比较了所提方法、其变体及高质量基线的性能。最后,我们展示了该方法在有限通信范围(即现实通信约束条件)下的有效性。