Multi-agent reinforcement learning (MARL) methods struggle with the non-stationarity of multi-agent systems and fail to adaptively learn online when tested with novel agents. Here, we leverage large language models (LLMs) to create an autonomous agent that can handle these challenges. Our agent, Hypothetical Minds, consists of a cognitively-inspired architecture, featuring modular components for perception, memory, and hierarchical planning over two levels of abstraction. We introduce the Theory of Mind module that scaffolds the high-level planning process by generating hypotheses about other agents' strategies in natural language. It then evaluates and iteratively refines these hypotheses by reinforcing hypotheses that make correct predictions about the other agents' behavior. Hypothetical Minds significantly improves performance over previous LLM-agent and RL baselines on a range of competitive, mixed motive, and collaborative domains in the Melting Pot benchmark, including both dyadic and population-based environments. Additionally, comparisons against LLM-agent baselines and ablations reveal the importance of hypothesis evaluation and refinement for succeeding on complex scenarios.
翻译:多智能体强化学习(MARL)方法难以应对多智能体系统的非平稳性,且在面对新智能体时无法进行自适应在线学习。本文利用大语言模型(LLMs)构建了一种能够应对这些挑战的自主智能体。我们提出的智能体“假设心智”采用认知启发式架构,包含感知、记忆及双抽象层次分层规划的模块化组件。我们引入了心理理论模块,该模块通过生成关于其他智能体策略的自然语言假设,为高层规划过程提供支架支持。随后,该系统通过强化能够正确预测其他智能体行为的假设,对这些假设进行评估与迭代优化。在Melting Pot基准测试涵盖的竞争性、混合动机及协作性领域(包括二元与群体环境)中,“假设心智”相较于先前基于LLM的智能体与强化学习基线模型均取得显著性能提升。此外,与LLM智能体基线及消融实验的对比表明,假设评估与优化机制对于复杂场景的成功处理具有关键作用。