For effective human-robot interaction, robots need to understand, plan, and execute complex, long-horizon tasks described by natural language. The recent and remarkable advances in large language models (LLMs) have shown promise for translating natural language into robot action sequences for complex tasks. However, many existing approaches either translate the natural language directly into robot trajectories, or factor the inference process by decomposing language into task sub-goals, then relying on a motion planner to execute each sub-goal. When complex environmental and temporal constraints are involved, inference over planning tasks must be performed jointly with motion plans using traditional task-and-motion planning (TAMP) algorithms, making such factorization untenable. Rather than using LLMs to directly plan task sub-goals, we instead perform few-shot translation from natural language task descriptions to an intermediary task representation that can then be consumed by a TAMP algorithm to jointly solve the task and motion plan. To improve translation, we automatically detect and correct both syntactic and semantic errors via autoregressive re-prompting, resulting in significant improvements in task completion. We show that our approach outperforms several methods using LLMs as planners in complex task domains.
翻译:为实现有效的人机交互,机器人需要理解、规划并执行由自然语言描述的复杂长周期任务。大语言模型(LLMs)近年来的显著进展为将自然语言转化为机器人执行复杂任务的行动序列提供了可能。然而,现有方法要么直接将自然语言翻译为机器人轨迹,要么将推理过程分解为任务子目标后再依赖运动规划器执行每个子目标。当涉及复杂的时空环境约束时,规划任务的推理必须与运动规划联合执行,这需要借助传统任务与运动规划(TAMP)算法,使得上述分解方法难以为继。不同于直接利用LLMs规划任务子目标,我们采用少样本翻译方法,将自然语言任务描述转化为中间任务表示,再由TAMP算法联合求解任务与运动规划。为提升翻译质量,我们通过自回归式重新提示自动检测并修正句法错误与语义错误,显著提高了任务完成率。实验表明,在复杂任务领域,本方法优于多种以LLMs作为规划器的基线方法。