In centralized multi-agent systems, often modeled as multi-agent partially observable Markov decision processes (MPOMDPs), the action and observation spaces grow exponentially with the number of agents, making the value and belief estimation of single-agent online planning ineffective. Prior work partially tackles value estimation by exploiting the inherent structure of multi-agent settings via so-called coordination graphs. Additionally, belief estimation has been improved by incorporating the likelihood of observations into the approximation. However, the challenges of value estimation and belief estimation have only been tackled individually, which prevents existing methods from scaling to many agents. Therefore, we address these challenges simultaneously. First, we introduce weighted particle filtering to a sample-based online planner for MPOMDPs. Second, we present a scalable approximation of the belief. Third, we bring an approach that exploits the typical locality of agent interactions to novel online planning algorithms for MPOMDPs operating on a so-called sparse particle filter tree. Our experimental evaluation against several state-of-the-art baselines shows that our methods (1) are competitive in settings with only a few agents and (2) improve over the baselines in the presence of many agents.
翻译:在集中式多智能体系统中,通常建模为多智能体部分可观测马尔可夫决策过程(MPOMDPs),其动作空间和观测空间随智能体数量呈指数级增长,使得单智能体在线规划中的价值估计和置信估计方法失效。现有工作通过协调图(coordination graphs)利用多智能体场景的内在结构,部分解决了价值估计问题;同时,通过将观测似然纳入近似过程改进了置信估计。然而,价值估计和置信估计的挑战仅被分别处理,导致现有方法难以扩展到大规模智能体系统。因此,我们同步解决这两大挑战。首先,我们为基于样本的MPOMDP在线规划器引入加权粒子滤波;其次,提出一种可扩展的置信近似方法;第三,通过利用智能体交互的典型局部性,为基于稀疏粒子滤波树的MPOMDP新型在线规划算法提供实现。与多个前沿基线方法的实验评估表明,我们的方法(1)在少量智能体场景中具有竞争力,(2)在大量智能体场景中显著优于基线方法。