LLMs have shown promising results in task planning due to their strong natural language understanding and reasoning capabilities. However, issues such as hallucinations, ambiguities in human instructions, environmental constraints, and limitations in the executing agent's capabilities often lead to flawed or incomplete plans. This paper proposes MultiTalk, an LLM-based task planning methodology that addresses these issues through a framework of introspective and extrospective dialogue loops. This approach helps ground generated plans in the context of the environment and the agent's capabilities, while also resolving uncertainties and ambiguities in the given task. These loops are enabled by specialized systems designed to extract and predict task-specific states, and flag mismatches or misalignments among the human user, the LLM agent, and the environment. Effective feedback pathways between these systems and the LLM planner foster meaningful dialogue. The efficacy of this methodology is demonstrated through its application to robotic manipulation tasks. Experiments and ablations highlight the robustness and reliability of our method, and comparisons with baselines further illustrate the superiority of MultiTalk in task planning for embodied agents.
翻译:大语言模型凭借其强大的自然语言理解与推理能力,在任务规划领域展现出广阔前景。然而,幻觉问题、人类指令的模糊性、环境约束以及执行智能体能力限制等因素,常导致生成计划存在缺陷或不完整。本文提出MultiTalk,一种基于大语言模型的任务规划方法,通过构建内省与外向双循环对话框架应对上述挑战。该框架将生成计划锚定于环境背景与智能体能力范畴,同时解析任务中固有的不确定性与模糊语义。这些对话循环由专用系统驱动,这些系统专用于提取与预测任务特定状态,并标记人类用户、大语言模型智能体与环境三者间的失配或错位问题。系统与大语言模型规划器间建立的有效反馈通路,促成了富有意义的对话交互。本方法在机器人操作任务中的实际应用验证了其有效性。实验与消融研究证明了该方法的鲁棒性与可靠性,与基线模型的对比进一步彰显了MultiTalk在具身智能体任务规划领域的优越性。