In centralized multi-agent systems, often modeled as multi-agent partially observable Markov decision processes (MPOMDPs), the action and observation spaces grow exponentially with the number of agents, making the value and belief estimation of single-agent online planning ineffective. Prior work partially tackles value estimation by exploiting the inherent structure of multi-agent settings via so-called coordination graphs. Additionally, belief estimation methods have been improved by incorporating the likelihood of observations into the approximation. However, the challenges of value estimation and belief estimation have only been tackled individually, which prevents existing methods from scaling to settings with many agents. Therefore, we address these challenges simultaneously. First, we introduce weighted particle filtering to a sample-based online planner for MPOMDPs. Second, we present a scalable approximation of the belief. Third, we bring an approach that exploits the typical locality of agent interactions to novel online planning algorithms for MPOMDPs operating on a so-called sparse particle filter tree. Our experimental evaluation against several state-of-the-art baselines shows that our methods (1) are competitive in settings with only a few agents and (2) improve over the baselines in the presence of many agents.
翻译:在集中式多智能体系统中,常建模为多智能体部分可观测马尔可夫决策过程(MPOMDP),其动作空间和观测空间随智能体数量呈指数级增长,导致单智能体在线规划中的价值和信念估计方法失效。已有研究通过引入协调图(coordination graphs)利用多智能体场景的内在结构,部分解决了价值估计问题;同时,通过将观测似然纳入近似过程改进了信念估计方法。然而,价值估计与信念估计的挑战仅被分别攻克,这阻碍了现有方法扩展到大规模智能体场景。为此,我们同时应对这些挑战。首先,将加权粒子滤波引入基于样本的MPOMDP在线规划器;其次,提出一种可扩展的信念近似方法;第三,利用智能体交互的典型局部性特征,为基于稀疏粒子滤波树的MPOMDP新型在线规划算法提供创新解决方案。实验评估表明,与多种最先进基线方法相比,我们的方法(1)在少量智能体场景中具有竞争力,(2)在大量智能体场景中显著优于基线方法。