While large language models (LLMs) have been increasingly deployed across tasks in language understanding and interactive decision-making, their impressive performance is largely due to the comprehensive and in-depth domain knowledge embedded within them. However, the extent of this knowledge can vary across different domains. Existing methods often assume that LLMs already possess such comprehensive and in-depth knowledge of their environment, overlooking potential gaps in their understanding of actual world dynamics. To address this gap, we introduce Discover, Verify, and Evolve (DiVE), a framework that discovers world dynamics from a small number of demonstrations, verifies the correctness of these dynamics, and evolves new, advanced dynamics tailored to the current situation. Through extensive evaluations, we analyze the impact of each component on performance and compare the automatically generated dynamics from DiVE with human-annotated world dynamics. Our results demonstrate that LLMs guided by DiVE can make better decisions, achieving rewards comparable to human players in the Crafter environment.
翻译:尽管大型语言模型(LLM)已越来越多地应用于语言理解和交互式决策任务,但其卓越性能主要源于模型内部嵌入的全面且深入的领域知识。然而,这种知识的覆盖程度在不同领域中存在差异。现有方法通常假设LLM已具备对其环境的全面深入认知,而忽视了其在实际世界动态理解方面可能存在的知识缺口。为弥补这一不足,我们提出了“发现、验证与演进”(DiVE)框架,该框架能够从少量演示样本中发现世界动态规律,验证这些动态的正确性,并针对当前情境演进生成新的高级动态规则。通过大量实验评估,我们分析了各组件对性能的影响,并将DiVE自动生成的动态规则与人工标注的世界动态进行对比。实验结果表明,在DiVE引导下的LLM能够做出更优决策,在Crafter环境中获得的奖励分数可达到与人类玩家相当的水平。