We propose Text2Motion, a language-based planning framework enabling robots to solve sequential manipulation tasks that require long-horizon reasoning. Given a natural language instruction, our framework constructs both a task- and motion-level plan that is verified to reach inferred symbolic goals. Text2Motion uses feasibility heuristics encoded in Q-functions of a library of skills to guide task planning with Large Language Models. Whereas previous language-based planners only consider the feasibility of individual skills, Text2Motion actively resolves geometric dependencies spanning skill sequences by performing geometric feasibility planning during its search. We evaluate our method on a suite of problems that require long-horizon reasoning, interpretation of abstract goals, and handling of partial affordance perception. Our experiments show that Text2Motion can solve these challenging problems with a success rate of 82%, while prior state-of-the-art language-based planning methods only achieve 13%. Text2Motion thus provides promising generalization characteristics to semantically diverse sequential manipulation tasks with geometric dependencies between skills.
翻译:我们提出Text2Motion,一种基于语言的规划框架,使机器人能够解决需要长时域推理的序列操作任务。给定自然语言指令,我们的框架构建经过验证可达到推断符号目标的任务级与运动级规划。Text2Motion利用技能库中Q函数编码的可行性启发式信息,指导大语言模型进行任务规划。与以往仅考虑单个技能可行性的语言规划器不同,Text2Motion在搜索过程中通过几何可行性规划主动解决跨技能序列的几何依赖关系。我们在需要长时域推理、抽象目标解读及部分功能感知处理的问题集上评估该方法。实验表明,Text2Motion能以82%的成功率解决这些挑战性问题,而先前最先进的语言规划方法仅达到13%。因此,Text2Motion为存在技能间几何依赖的语义多样化序列操作任务提供了有前景的泛化特性。