Multi-object rearrangement is a crucial skill for service robots, and commonsense reasoning is frequently needed in this process. However, achieving commonsense arrangements requires knowledge about objects, which is hard to transfer to robots. Large language models (LLMs) are one potential source of this knowledge, but they do not naively capture information about plausible physical arrangements of the world. We propose LLM-GROP, which uses prompting to extract commonsense knowledge about semantically valid object configurations from an LLM and instantiates them with a task and motion planner in order to generalize to varying scene geometry. LLM-GROP allows us to go from natural-language commands to human-aligned object rearrangement in varied environments. Based on human evaluations, our approach achieves the highest rating while outperforming competitive baselines in terms of success rate while maintaining comparable cumulative action costs. Finally, we demonstrate a practical implementation of LLM-GROP on a mobile manipulator in real-world scenarios. Supplementary materials are available at: https://sites.google.com/view/llm-grop
翻译:多物体重排是服务机器人的一项关键技能,此过程常需常识推理。然而,实现符合常识的布局需要关于物体的知识,这类知识难以迁移至机器人。大语言模型(LLMs)是此类知识的潜在来源之一,但其无法直接获取关于物理世界中合理布局的信息。我们提出LLM-GROP方法,通过提示从LLM中提取关于语义有效物体配置的常识知识,并将其与任务和运动规划器结合,以泛化至不同场景几何布局。LLM-GROP使我们能够从自然语言指令出发,在多样化环境中实现符合人类期望的物体重排。基于人类评估,我们的方法在成功率上优于竞争基线,同时保持可比较的累积行动成本,并获得了最高评分。最后,我们展示了LLM-GROP在真实场景中移动机械臂上的实际部署。补充材料详见:https://sites.google.com/view/llm-grop