Can emergent language models faithfully model the intelligence of decision-making agents? Though modern language models exhibit already some reasoning ability, and theoretically can potentially express any probable distribution over tokens, it remains underexplored how the world knowledge these pretrained models have memorized can be utilized to comprehend an agent's behaviour in the physical world. This study empirically examines, for the first time, how well large language models (LLMs) can build a mental model of agents, termed agent mental modelling, by reasoning about an agent's behaviour and its effect on states from agent interaction history. This research may unveil the potential of leveraging LLMs for elucidating RL agent behaviour, addressing a key challenge in eXplainable reinforcement learning (XRL). To this end, we propose specific evaluation metrics and test them on selected RL task datasets of varying complexity, reporting findings on agent mental model establishment. Our results disclose that LLMs are not yet capable of fully mental modelling agents through inference alone without further innovations. This work thus provides new insights into the capabilities and limitations of modern LLMs.
翻译:新兴语言模型能否忠实建模决策智能体的智能?尽管现代语言模型已展现出一定的推理能力,且理论上能够表达词元上的任意概率分布,但如何利用这些预训练模型所记忆的世界知识来理解智能体在物理世界中的行为,仍缺乏深入探索。本研究首次通过实证方法检验大型语言模型(LLMs)能否通过推理智能体的行为及其对状态的影响(基于智能体交互历史),建立对智能体的心智模型(称为智能体心智建模)。这项研究可能揭示利用LLMs阐释强化学习智能体行为的潜力,从而应对可解释强化学习(XRL)领域的关键挑战。为此,我们提出了专门的评估指标,并在不同复杂度的选定强化学习任务数据集上进行测试,报告了关于智能体心智模型建立的发现。我们的结果表明,若缺乏进一步创新,LLMs目前尚无法仅通过推理完全建立智能体的心智模型。因此,本研究为现代LLMs的能力与局限提供了新的见解。