The Common Information (CI) approach provides a systematic way to transform a multi-agent stochastic control problem to a single-agent partially observed Markov decision problem (POMDP) called the coordinator's POMDP. However, such a POMDP can be hard to solve due to its extraordinarily large action space. We propose a new algorithm for multi-agent stochastic control problems, called coordinator's heuristic search value iteration (CHSVI), that combines the CI approach and point-based POMDP algorithms for large action spaces. We demonstrate the algorithm through optimally solving several benchmark problems.
翻译:通用信息(CI)方法提供了一种系统化途径,可将多智能体随机控制问题转化为单智能体部分可观测马尔可夫决策问题(POMDP),即协调器POMDP。然而,由于这类POMDP的动作空间异常庞大,求解难度极高。我们提出了一种针对多智能体随机控制问题的新算法——协调器启发式搜索值迭代法(CHSVI),该算法融合了CI方法与大动作空间下的点基POMDP算法。我们通过最优求解多个基准问题对该算法进行了验证。