Learning communication strategies in cooperative multi-agent reinforcement learning (MARL) has recently attracted intensive attention. Early studies typically assumed a fully-connected communication topology among agents, which induces high communication costs and may not be feasible. Some recent works have developed adaptive communication strategies to reduce communication overhead, but these methods cannot effectively obtain valuable information from agents that are beyond the communication range. In this paper, we consider a realistic communication model where each agent has a limited communication range, and the communication topology dynamically changes. To facilitate effective agent communication, we propose a novel communication protocol called Adaptively Controlled Two-Hop Communication (AC2C). After an initial local communication round, AC2C employs an adaptive two-hop communication strategy to enable long-range information exchange among agents to boost performance, which is implemented by a communication controller. This controller determines whether each agent should ask for two-hop messages and thus helps to reduce the communication overhead during distributed execution. We evaluate AC2C on three cooperative multi-agent tasks, and the experimental results show that it outperforms relevant baselines with lower communication costs.
翻译:在多智能体强化学习中学习通信策略近期引起了广泛关注。早期研究通常假设智能体间采用全连接通信拓扑,这会导致高额通信成本且可能不可行。近期一些工作开发了自适应通信策略以降低通信开销,但这些方法无法有效获取通信范围外智能体的有价值信息。本文考虑一个现实通信模型:每个智能体具有有限通信范围,且通信拓扑动态变化。为促进智能体间有效通信,我们提出一种名为"自适应控制两跳通信"(AC2C)的新型通信协议。在初始本地通信回合后,AC2C采用自适应两跳通信策略实现智能体间远距离信息交换以提升性能,该策略通过通信控制器实现。该控制器决定每个智能体是否应请求两跳消息,从而有助于降低分布式执行过程中的通信开销。我们在三个协作型多智能体任务上评估了AC2C,实验结果表明,该方法以更低通信成本优于相关基线方法。