We propose Text2Motion, a language-based planning framework enabling robots to solve sequential manipulation tasks that require long-horizon reasoning. Given a natural language instruction, our framework constructs both a task- and motion-level plan that is verified to reach inferred symbolic goals. Text2Motion uses feasibility heuristics encoded in Q-functions of a library of skills to guide task planning with Large Language Models. Whereas previous language-based planners only consider the feasibility of individual skills, Text2Motion actively resolves geometric dependencies spanning skill sequences by performing geometric feasibility planning during its search. We evaluate our method on a suite of problems that require long-horizon reasoning, interpretation of abstract goals, and handling of partial affordance perception. Our experiments show that Text2Motion can solve these challenging problems with a success rate of 82%, while prior state-of-the-art language-based planning methods only achieve 13%. Text2Motion thus provides promising generalization characteristics to semantically diverse sequential manipulation tasks with geometric dependencies between skills.
翻译:我们提出Text2Motion,一种基于语言的规划框架,使机器人能够解决需要长程推理的序列操作任务。在给定自然语言指令的情况下,我们的框架构建了一个任务级和运动级规划,该规划经验证能够达到推断出的符号目标。Text2Motion利用技能库中Q函数编码的可行性启发式信息,结合大型语言模型引导任务规划。相较于此前仅考虑单个技能可行性的基于语言的规划器,Text2Motion在其搜索过程中通过执行几何可行性规划,主动解决跨技能序列的几何依赖关系。我们在需要长程推理、抽象目标解释以及部分可供性感知处理的一系列问题上评估了该方法。实验表明,Text2Motion能够以82%的成功率解决这些具有挑战性的问题,而此前最先进的基于语言规划方法仅达到13%。因此,Text2Motion为语义多样的、存在技能间几何依赖关系的序列操作任务提供了有前景的泛化特性。