Decentralized partially observable Markov decision processes with communication (Dec-POMDP-Com) provide a framework for multiagent decision making under uncertainty, but the NEXP-complete complexity for finite-horizon problems renders solutions intractable in general. While sharing actions and observations can reduce the complexity to PSPACE-complete, we propose an approach that bridges POMDPs and Dec-POMDPs by communicating only suggested joint actions, eliminating the need to share observations while retaining near-centralized performance. Our algorithm estimates joint beliefs using shared actions to prune infeasible beliefs. Each agent maintains possible belief sets for other agents, pruning them based on suggested actions to form an estimated joint belief usable with any centralized policy. This approach requires solving a POMDP for each agent, reducing computational complexity while preserving performance. We demonstrate its effectiveness on several Dec-POMDP benchmarks, showing performance comparable to centralized methods when shared actions enable effective belief pruning. This action-based communication framework offers a natural avenue for integrating human-agent cooperation, opening new directions for scalable multiagent planning under uncertainty, with applications in both autonomous systems and human-agent teams.
翻译:带通信的分散式部分可观测马尔可夫决策过程(Dec-POMDP-Com)为不确定性下的多智能体决策提供了框架,但有限时域问题的NEXP完全复杂性通常导致求解不可行。虽然共享动作与观测可将复杂度降至PSPACE完全,本文提出一种连接POMDP与Dec-POMDP的方法:仅通过通信传递建议的联合动作,在无需共享观测的同时保持接近集中式策略的性能。我们的算法利用共享动作估计联合信念并剪枝不可行信念。每个智能体维护其他智能体的可能信念集合,基于建议动作进行剪枝以形成可与任意集中式策略配合的估计联合信念。该方法仅需为每个智能体求解POMDP,在保持性能的同时降低计算复杂度。我们在多个Dec-POMDP基准测试中验证了其有效性,结果表明当共享动作能实现有效信念剪枝时,其性能可与集中式方法媲美。这种基于动作的通信框架为整合人机协同提供了自然路径,为不确定性下的可扩展多智能体规划开辟了新方向,在自主系统与人机协同领域均具有应用前景。