For effective human-robot interaction, robots need to understand, plan, and execute complex, long-horizon tasks described by natural language. Recent advances in large language models (LLMs) have shown promise for translating natural language into robot action sequences for complex tasks. However, existing approaches either translate the natural language directly into robot trajectories or factor the inference process by decomposing language into task sub-goals and relying on a motion planner to execute each sub-goal. When complex environmental and temporal constraints are involved, inference over planning tasks must be performed jointly with motion plans using traditional task-and-motion planning (TAMP) algorithms, making factorization into subgoals untenable. Rather than using LLMs to directly plan task sub-goals, we instead perform few-shot translation from natural language task descriptions to an intermediate task representation that can then be consumed by a TAMP algorithm to jointly solve the task and motion plan. To improve translation, we automatically detect and correct both syntactic and semantic errors via autoregressive re-prompting, resulting in significant improvements in task completion. We show that our approach outperforms several methods using LLMs as planners in complex task domains. See our project website https://yongchao98.github.io/MIT-REALM-AutoTAMP/ for prompts, videos, and code.
翻译:为有效实现人机交互,机器人需理解、规划并执行由自然语言描述的复杂长期任务。大语言模型的最新进展已展现出将自然语言转化为复杂任务机器人动作序列的潜力。然而,现有方法要么直接将自然语言翻译为机器人轨迹,要么通过将语言分解为任务子目标并依赖运动规划器执行各子目标来解耦推理过程。当涉及复杂环境与时间约束时,规划任务的推理必须与传统任务与运动规划算法联合执行运动计划,这使得子目标解耦策略难以为继。本文不采用大语言模型直接规划任务子目标,而是通过少样本翻译将自然语言任务描述转化为可被任务与运动规划算法消费的中间任务表征,从而联合求解任务与运动规划。为提升翻译质量,我们通过自回归重新提示自动检测并修正语法错误与语义错误,显著提升任务完成率。实验表明,在复杂任务领域,本方法优于多种以大语言模型为规划器的基线方法。提示词、演示视频及代码详见项目网站 https://yongchao98.github.io/MIT-REALM-AutoTAMP/。