The main goal of this paper is to investigate continuous-time distributed dynamic programming (DP) algorithms for networked multi-agent Markov decision problems (MAMDPs). In our study, we adopt a distributed multi-agent framework where individual agents have access only to their own rewards, lacking insights into the rewards of other agents. Moreover, each agent has the ability to share its parameters with neighboring agents through a communication network, represented by a graph. We first introduce a novel distributed DP, inspired by the distributed optimization method of Wang and Elia. Next, a new distributed DP is introduced through a decoupling process. The convergence of the DP algorithms is proved through systems and control perspectives. The study in this paper sets the stage for new distributed temporal different learning algorithms.
翻译:本文的主要目标是研究面向网络化多智能体马尔可夫决策问题(MAMDPs)的连续时间分布式动态规划(DP)算法。在本研究中,我们采用分布式多智能体框架,其中每个智能体仅能获取自身的奖励信息,而无法获知其他智能体的奖励信息。此外,各智能体能够通过通信网络(以图形式表示)与相邻智能体共享其参数。我们首先受Wang与Elia提出的分布式优化方法启发,引入一种新型的分布式动态规划算法。随后,通过解耦过程提出另一种新的分布式动态规划方法。从系统与控制视角证明了所提动态规划算法的收敛性。本文的研究为新型分布式时序差分学习算法奠定了理论基础。