Multi-object rearrangement is a crucial skill for service robots, and commonsense reasoning is frequently needed in this process. However, achieving commonsense arrangements requires knowledge about objects, which is hard to transfer to robots. Large language models (LLMs) are one potential source of this knowledge, but they do not naively capture information about plausible physical arrangements of the world. We propose LLM-GROP, which uses prompting to extract commonsense knowledge about semantically valid object configurations from an LLM and instantiates them with a task and motion planner in order to generalize to varying scene geometry. LLM-GROP allows us to go from natural-language commands to human-aligned object rearrangement in varied environments. Based on human evaluations, our approach achieves the highest rating while outperforming competitive baselines in terms of success rate while maintaining comparable cumulative action costs. Finally, we demonstrate a practical implementation of LLM-GROP on a mobile manipulator in real-world scenarios. Supplementary materials are available at: https://sites.google.com/view/llm-grop
翻译:多物体重排是服务机器人的一项关键技能,在此过程中常需常识推理。然而,实现常识性重排需要关于物体的知识,而这些知识难以传递给机器人。大语言模型(LLMs)是此类知识的潜在来源,但它们无法直接捕捉关于世界中合理物理布局的信息。我们提出LLM-GROP方法,通过提示从大语言模型中提取关于语义有效物体配置的常识知识,并将其与任务与运动规划器实例化,以泛化到不同场景几何结构。LLM-GROP使我们能够从自然语言指令转向在不同环境中与人类偏好一致的物体重排。基于人工评估,我们的方法在保持可比累积动作成本的同时,在成功率上优于竞争基线,并获得了最高评分。最后,我们展示了LLM-GROP在真实场景移动操作平台上的实际实现。补充材料见:https://sites.google.com/view/llm-grop