The design of autonomous agents that can interact effectively with other agents without prior coordination is a core problem in multi-agent systems. Type-based reasoning methods achieve this by maintaining a belief over a set of potential behaviours for the other agents. However, current methods are limited in that they assume full observability of the state and actions of the other agent or do not scale efficiently to larger problems with longer planning horizons. Addressing these limitations, we propose Partially Observable Type-based Meta Monte-Carlo Planning (POTMMCP) - an online Monte-Carlo Tree Search based planning method for type-based reasoning in large partially observable environments. POTMMCP incorporates a novel meta-policy for guiding search and evaluating beliefs, allowing it to search more effectively to longer horizons using less planning time. We show that our method converges to the optimal solution in the limit and empirically demonstrate that it effectively adapts online to diverse sets of other agents across a range of environments. Comparisons with the state-of-the art method on problems with up to $10^{14}$ states and $10^8$ observations indicate that POTMMCP is able to compute better solutions significantly faster.
翻译:自主智能体无需事先协调即可与其他智能体有效交互的设计是多智能体系统中的核心问题。基于类型的推理方法通过维护对其他智能体潜在行为集合的信念来实现这一目标。然而,现有方法存在局限性:它们假设其他智能体的状态和动作完全可观测,或者无法高效扩展到具有更长规划视界的更大规模问题。为解决这些局限,我们提出部分可观测的基于类型的元蒙特卡洛规划(POTMMCP)——一种基于在线蒙特卡洛树搜索的规划方法,用于大规模部分可观测环境中的类型推理。POTMMCP引入了一种新颖的元策略来引导搜索和评估信念,从而能够以更少的规划时间在更长视界上进行更有效的搜索。我们证明该方法在极限情况下收敛于最优解,并通过实证表明它能在多种环境中有效地在线适应不同类型的其他智能体。与现有最先进方法在包含高达$10^{14}$个状态和$10^8$个观测值的问题上的比较表明,POTMMCP能够以显著更快的速度计算出更优解。