Learning communication strategies in cooperative multi-agent reinforcement learning (MARL) has recently attracted intensive attention. Early studies typically assumed a fully-connected communication topology among agents, which induces high communication costs and may not be feasible. Some recent works have developed adaptive communication strategies to reduce communication overhead, but these methods cannot effectively obtain valuable information from agents that are beyond the communication range. In this paper, we consider a realistic communication model where each agent has a limited communication range, and the communication topology dynamically changes. To facilitate effective agent communication, we propose a novel communication protocol called Adaptively Controlled Two-Hop Communication (AC2C). After an initial local communication round, AC2C employs an adaptive two-hop communication strategy to enable long-range information exchange among agents to boost performance, which is implemented by a communication controller. This controller determines whether each agent should ask for two-hop messages and thus helps to reduce the communication overhead during distributed execution. We evaluate AC2C on three cooperative multi-agent tasks, and the experimental results show that it outperforms relevant baselines with lower communication costs.
翻译:在协作多智能体强化学习中学习通信策略近期引起了广泛关注。早期研究通常假设智能体间采用全连接通信拓扑,这会导致高昂的通信成本且可能不可行。最近的一些工作开发了自适应通信策略以降低通信开销,但这些方法无法有效获取超出通信范围的智能体中的有价值信息。在本文中,我们考虑了一种实际通信模型,其中每个智能体具有有限的通信范围,并且通信拓扑动态变化。为了促进智能体间的有效通信,我们提出了一种名为自适应控制两跳通信的新颖通信协议。在初始局部通信回合后,AC2C采用自适应两跳通信策略,使智能体间能够进行长距离信息交换以提升性能,该策略通过通信控制器实现。该控制器决定每个智能体是否应请求两跳消息,从而有助于在分布式执行过程中降低通信开销。我们在三个协作多智能体任务上评估了AC2C,实验结果表明,在较低通信成本下,它优于相关基线方法。