Decentralized Monte Carlo Tree Search (Dec-MCTS) is widely used for cooperative multi-agent planning but struggles in sparse or skewed reward environments. We introduce Coordinated Boltzmann MCTS (CB-MCTS), which replaces deterministic UCT with a stochastic Boltzmann policy and a decaying entropy bonus for sustained yet focused exploration. While Boltzmann exploration has been studied in single-agent MCTS, applying it in multi-agent systems poses unique challenges. CB-MCTS is the first to address this. We analyze CB-MCTS in the simple-regret setting and show in simulations that it outperforms Dec-MCTS in deceptive scenarios and remains competitive on standard benchmarks, providing a robust solution for multi-agent planning.
翻译:去中心化蒙特卡洛树搜索(Dec-MCTS)在协作式多智能体规划中被广泛使用,但在稀疏或偏斜奖励环境中表现不佳。我们提出了协调玻尔兹曼蒙特卡洛树搜索(CB-MCTS),该方法用随机玻尔兹曼策略和衰减熵奖励替代了确定性的UCT,以实现持续且聚焦的探索。虽然玻尔兹曼探索已在单智能体MCTS中得到研究,但将其应用于多智能体系统带来了独特的挑战。CB-MCTS是首个解决此问题的方法。我们在简单遗憾设定下分析了CB-MCTS,并在仿真中证明,其在欺骗性场景中优于Dec-MCTS,同时在标准基准测试中保持竞争力,为多智能体规划提供了一个鲁棒的解决方案。