Large Language Models (LLMs) trained on massive corpora have shown remarkable success in knowledge-intensive tasks. Yet, most of them rely on pre-stored knowledge. Inducing new general knowledge from a specific environment and performing reasoning with the acquired knowledge -- \textit{situated inductive reasoning}, is crucial and challenging for machine intelligence. In this paper, we design Mars, an interactive environment devised for situated inductive reasoning. It introduces counter-commonsense game mechanisms by modifying terrain, survival setting and task dependency while adhering to certain principles. In Mars, agents need to actively interact with their surroundings, derive useful rules and perform decision-making tasks in specific contexts. We conduct experiments on various RL-based and LLM-based methods, finding that they all struggle on this challenging situated inductive reasoning benchmark. Furthermore, we explore \textit{Induction from Reflection}, where we instruct agents to perform inductive reasoning from history trajectory. The superior performance underscores the importance of inductive reasoning in Mars. Through Mars, we aim to galvanize advancements in situated inductive reasoning and set the stage for developing the next generation of AI systems that can reason in an adaptive and context-sensitive way.
翻译:在大型语料库上训练的大语言模型(LLMs)已在知识密集型任务中展现出卓越成就。然而,大多数模型仍依赖于预存储的知识。从特定环境中归纳新的通用知识,并运用所获知识进行推理——即“情境归纳推理”,对机器智能而言至关重要且充满挑战。本文设计了火星(Mars),一个专为情境归纳推理构建的交互式环境。该环境通过修改地形、生存设定与任务依赖关系,同时遵循特定原则,引入了反常识的游戏机制。在火星中,智能体需主动与环境交互,推导有效规则并在具体情境中执行决策任务。我们对多种基于强化学习与基于大语言模型的方法进行了实验,发现它们均在这一具有挑战性的情境归纳推理基准上表现不佳。此外,我们探索了“反思归纳”方法,指导智能体从历史轨迹中进行归纳推理。其优越性能凸显了归纳推理在火星环境中的重要性。通过火星,我们旨在推动情境归纳推理领域的发展,并为构建能够以自适应和情境敏感方式进行推理的新一代人工智能系统奠定基础。