Achieving distributed reinforcement learning (RL) for large-scale cooperative multi-agent systems (MASs) is challenging because: (i) each agent has access to only limited information; (ii) issues on convergence or computational complexity emerge due to the curse of dimensionality. In this paper, we propose a general computationally efficient distributed framework for cooperative multi-agent reinforcement learning (MARL) by utilizing the structures of graphs involved in this problem. We introduce three coupling graphs describing three types of inter-agent couplings in MARL, namely, the state graph, the observation graph and the reward graph. By further considering a communication graph, we propose two distributed RL approaches based on local value-functions derived from the coupling graphs. The first approach is able to reduce sample complexity significantly under specific conditions on the aforementioned four graphs. The second approach provides an approximate solution and can be efficient even for problems with dense coupling graphs. Here there is a trade-off between minimizing the approximation error and reducing the computational complexity. Simulations show that our RL algorithms have a significantly improved scalability to large-scale MASs compared with centralized and consensus-based distributed RL algorithms.
翻译:针对大规模合作式多智能体系统(MASs)实现分布式强化学习(RL)极具挑战性,原因在于:(i) 每个智能体仅能获取有限信息;(ii) 维数灾难导致收敛性或计算复杂度问题。本文通过利用问题中涉及的图结构,提出了一种通用的、计算高效的分布式合作式多智能体强化学习(MARL)框架。我们引入三种耦合图来描述MARL中三类智能体间耦合关系,即状态图、观测图和奖励图。进一步考虑通信图后,我们提出了两种基于耦合图导出的局部价值函数的分布式RL方法。第一种方法能在上述四类图满足特定条件时显著降低样本复杂度。第二种方法提供近似解,即使对于密集耦合图的问题也能保持高效性,但需要在最小化近似误差与降低计算复杂度之间进行权衡。仿真结果表明,与集中式和基于共识的分布式RL算法相比,我们的RL算法在大规模MASs中具有显著更优的可扩展性。