Multi-object rearrangement is a crucial skill for service robots, and commonsense reasoning is frequently needed in this process. However, achieving commonsense arrangements requires knowledge about objects, which is hard to transfer to robots. Large language models (LLMs) are one potential source of this knowledge, but they do not naively capture information about plausible physical arrangements of the world. We propose LLM-GROP, which uses prompting to extract commonsense knowledge about semantically valid object configurations from an LLM and instantiates them with a task and motion planner in order to generalize to varying scene geometry. LLM-GROP allows us to go from natural-language commands to human-aligned object rearrangement in varied environments. Based on human evaluations, our approach achieves the highest rating while outperforming competitive baselines in terms of success rate while maintaining comparable cumulative action costs. Finally, we demonstrate a practical implementation of LLM-GROP on a mobile manipulator in real-world scenarios. Supplementary materials are available at: https://sites.google.com/view/llm-grop
翻译:多目标重排是服务机器人的一项关键技能,在此过程中常需常识推理。然而,实现符合常识的物体排布需要关于物体的知识,而这些知识难以迁移至机器人。大语言模型(LLMs)是此类知识的潜在来源之一,但单纯依赖LLMs无法捕捉现实中物体的合理物理排布信息。我们提出LLM-GROP方法,通过提示工程从LLM中提取关于语义有效物体配置的常识知识,并利用任务与运动规划器将这些知识实例化,从而泛化至不同场景几何结构。LLM-GROP使我们能够从自然语言指令出发,在多样化环境中实现与人类意图一致的物体重排。基于人类评估,我们的方法在成功率上优于竞争基线,同时保持可比的累积动作成本,并获得最高评分。最后,我们在真实场景中展示了LLM-GROP在移动操作臂上的实际部署。补充材料见:https://sites.google.com/view/llm-grop