While large language models (LLMs) have shown promising capabilities as zero-shot planners for embodied agents, their inability to learn from experience and build persistent mental models limits their robustness in complex open-world environments like Minecraft. We introduce MINDSTORES, an experience-augmented planning framework that enables embodied agents to build and leverage mental models through natural interaction with their environment. Drawing inspiration from how humans construct and refine cognitive mental models, our approach extends existing zero-shot LLM planning by maintaining a database of past experiences that informs future planning iterations. The key innovation is representing accumulated experiences as natural language embeddings of (state, task, plan, outcome) tuples, which can then be efficiently retrieved and reasoned over by an LLM planner to generate insights and guide plan refinement for novel states and tasks. Through extensive experiments in the MineDojo environment, a simulation environment for agents in Minecraft that provides low-level controls for Minecraft, we find that MINDSTORES learns and applies its knowledge significantly better than existing memory-based LLM planners while maintaining the flexibility and generalization benefits of zero-shot approaches, representing an important step toward more capable embodied AI systems that can learn continuously through natural experience.
翻译:尽管大型语言模型(LLMs)作为具身智能体的零样本规划器已展现出有前景的能力,但其无法从经验中学习并构建持久心智模型的特性限制了它们在《我的世界》等复杂开放世界环境中的鲁棒性。我们提出了MINDSTORES,一种经验增强的规划框架,使具身智能体能够通过与环境的自然交互来构建和利用心智模型。受人类构建和完善认知心智模型方式的启发,我们的方法通过维护一个记录过往经验的数据库来增强现有的零样本LLM规划,该数据库为未来的规划迭代提供信息。其核心创新在于将累积的经验表示为(状态、任务、计划、结果)元组的自然语言嵌入,随后可由LLM规划器高效检索并基于此进行推理,从而为新的状态和任务生成洞见并指导计划优化。通过在MineDojo环境(一个为《我的世界》中智能体提供底层控制接口的仿真环境)中进行大量实验,我们发现MINDSTORES在知识学习与应用方面显著优于现有的基于记忆的LLM规划器,同时保持了零样本方法的灵活性和泛化优势,这标志着朝着能够通过自然经验持续学习的、更强大的具身人工智能系统迈出了重要一步。