This paper proposes an exploration technique for multi-agent reinforcement learning (MARL) with graph-based communication among agents. We assume the individual rewards received by the agents are independent of the actions by the other agents, while their policies are coupled. In the proposed framework, neighbouring agents collaborate to estimate the uncertainty about the state-action space in order to execute more efficient explorative behaviour. Different from existing works, the proposed algorithm does not require counting mechanisms and can be applied to continuous-state environments without requiring complex conversion techniques. Moreover, the proposed scheme allows agents to communicate in a fully decentralized manner with minimal information exchange. And for continuous-state scenarios, each agent needs to exchange only a single parameter vector. The performance of the algorithm is verified with theoretical results for discrete-state scenarios and with experiments for continuous ones.
翻译:本文提出了一种适用于基于图通信的多智能体强化学习(MARL)的探索技术。我们假设智能体获得的个体奖励独立于其他智能体的动作,而它们的策略是耦合的。在所提出的框架中,相邻智能体协作估计状态-动作空间的不确定性,以执行更高效的探索行为。与现有方法不同,该算法无需计数机制,可应用于连续状态环境,无需复杂的转换技术。此外,该方案允许智能体以完全去中心化的方式进行通信,仅需交换极少量信息。针对连续状态场景,每个智能体仅需交换单个参数向量。通过离散状态场景的理论推导与连续状态场景的实验验证了算法的性能。