In this paper, we study the problem of planning in Minecraft, a popular, democratized yet challenging open-ended environment for developing multi-task embodied agents. We've found two primary challenges of empowering such agents with planning: 1) planning in an open-ended world like Minecraft requires precise and multi-step reasoning due to the long-term nature of the tasks, and 2) as vanilla planners do not consider the proximity to the current agent when ordering parallel sub-goals within a complicated plan, the resulting plan could be inefficient. To this end, we propose "Describe, Explain, Plan and Select" (DEPS), an interactive planning approach based on Large Language Models (LLMs). Our approach helps with better error correction from the feedback during the long-haul planning, while also bringing the sense of proximity via goal Selector, a learnable module that ranks parallel sub-goals based on the estimated steps of completion and improves the original plan accordingly. Our experiments mark the milestone of the first multi-task agent that can robustly accomplish 70+ Minecraft tasks and nearly doubles the overall performances. Finally, the ablation and exploratory studies detail how our design beats the counterparts and provide a promising update on the $\texttt{ObtainDiamond}$ grand challenge with our approach. The code is released at https://github.com/CraftJarvis/MC-Planner.
翻译:本文研究在Minecraft——一个广受欢迎、民主化但具有挑战性的开放环境——中规划问题,旨在开发多任务具身智能体。我们发现,赋能此类智能体进行规划面临两大主要挑战:1)在类似Minecraft的开放世界中,由于任务长期性,规划需要精确的多步推理;2)传统规划器在排序复杂计划中的并行子目标时,未考虑与当前智能体的距离,导致规划方案效率低下。为此,我们提出“描述、解释、规划与选择”(DEPS),一种基于大语言模型(LLMs)的交互式规划方法。该方法有助于在长期规划过程中更好地从反馈中进行错误修正,同时通过目标选择器引入距离感知——这是一个可学习模块,能根据估计完成步骤对并行子目标进行排序,并相应优化原始计划。实验标志着首个能够稳健完成70余项Minecraft任务的多任务智能体里程碑,整体性能几乎翻倍。最后,消融与探索性研究详述了我们的设计如何超越同类方法,并基于该方法在$\texttt{ObtainDiamond}$重大挑战任务上取得了有前景的进展。代码已开源:https://github.com/CraftJarvis/MC-Planner。