Network Markov Decision Processes (MDPs), a popular model for multi-agent control, pose a significant challenge to efficient learning due to the exponential growth of the global state-action space with the number of agents. In this work, utilizing the exponential decay property of network dynamics, we first derive scalable spectral local representations for network MDPs, which induces a network linear subspace for the local $Q$-function of each agent. Building on these local spectral representations, we design a scalable algorithmic framework for continuous state-action network MDPs, and provide end-to-end guarantees for the convergence of our algorithm. Empirically, we validate the effectiveness of our scalable representation-based approach on two benchmark problems, and demonstrate the advantages of our approach over generic function approximation approaches to representing the local $Q$-functions.
翻译:网络马尔可夫决策过程(MDPs)作为一种流行的多智能体控制模型,由于全局状态-动作空间随智能体数量呈指数增长,给高效学习带来了重大挑战。本文利用网络动态的指数衰减特性,首先为网络MDPs推导出可扩展的谱局部表示,这为每个智能体的局部$Q$函数诱导出一个网络线性子空间。基于这些局部谱表示,我们为连续状态-动作网络MDPs设计了一个可扩展的算法框架,并为算法的收敛性提供了端到端保证。在实证研究中,我们在两个基准问题上验证了基于可扩展表示方法的有效性,并展示了该方法在表示局部$Q$函数方面相较于通用函数逼近方法的优势。