Decentralized partially observable Markov decision processes with communication (Dec-POMDP-Com) provide a framework for multiagent decision making under uncertainty, but the NEXP-complete complexity renders solutions intractable in general. While sharing actions and observations can reduce the complexity to PSPACE-complete, we propose an approach that bridges POMDPs and Dec-POMDPs by communicating only suggested joint actions, eliminating the need to share observations while maintaining performance comparable to fully centralized planning and execution. Our algorithm estimates joint beliefs using shared actions to prune infeasible beliefs. Each agent maintains possible belief sets for other agents, pruning them based on suggested actions to form an estimated joint belief usable with any centralized policy. This approach requires solving a POMDP for each agent, reducing computational complexity while preserving performance. We demonstrate its effectiveness on several Dec-POMDP benchmarks showing performance comparable to centralized methods when shared actions enable effective belief pruning. This action-based communication framework offers a natural avenue for integrating human-agent cooperation, opening new directions for scalable multiagent planning under uncertainty, with applications in both autonomous systems and human-agent teams.
翻译:具有通信的分散式部分可观测马尔可夫决策过程(Dec-POMDP-Com)为不确定性下的多智能体决策提供了框架,但其NEXP完全复杂性通常导致求解不可行。尽管共享动作与观测可将复杂性降至PSPACE完全,我们提出一种方法,通过仅通信建议的联合动作来桥接POMDP与Dec-POMDP,在无需共享观测的同时保持与完全集中式规划及执行相当的性能。我们的算法利用共享动作估计联合信念以剪除不可行信念。每个智能体维护其他智能体的可能信念集合,基于建议动作进行剪枝以形成可与任意集中式策略配合使用的估计联合信念。该方法要求为每个智能体求解一个POMDP,在保持性能的同时降低了计算复杂度。我们在多个Dec-POMDP基准测试中验证了其有效性:当共享动作能实现有效信念剪枝时,其性能与集中式方法相当。这种基于动作的通信框架为整合人机协作提供了自然路径,为不确定性下的可扩展多智能体规划开辟了新方向,在自主系统与人机协同团队中均具应用前景。