Multi-object rearrangement is a crucial skill for service robots, and commonsense reasoning is frequently needed in this process. However, achieving commonsense arrangements requires knowledge about objects, which is hard to transfer to robots. Large language models (LLMs) are one potential source of this knowledge, but they do not naively capture information about plausible physical arrangements of the world. We propose LLM-GROP, which uses prompting to extract commonsense knowledge about semantically valid object configurations from an LLM and instantiates them with a task and motion planner in order to generalize to varying scene geometry. LLM-GROP allows us to go from natural-language commands to human-aligned object rearrangement in varied environments. Based on human evaluations, our approach achieves the highest rating while outperforming competitive baselines in terms of success rate while maintaining comparable cumulative action costs. Finally, we demonstrate a practical implementation of LLM-GROP on a mobile manipulator in real-world scenarios. Supplementary materials are available at: https://sites.google.com/view/llm-grop
翻译:多物体重排是服务机器人的一项关键技能,在此过程中常需常识推理。然而,实现常识性重排需要关于物体的知识,这些知识难以传授给机器人。大型语言模型(LLM)是这类知识的潜在来源,但它们无法直接捕捉关于世界中合理物理布局的信息。我们提出LLM-GROP方法,该方法通过提示从LLM中提取关于语义有效物体配置的常识知识,并利用任务与运动规划器实例化这些知识,以推广到不同场景几何结构。LLM-GROP使我们能够从自然语言指令出发,在多样化环境中实现符合人类偏好的物体重排。基于人工评估,我们的方法在成功率上超越了竞争基线,同时保持了相当的动作累积成本,获得了最高评分。最后,我们展示了LLM-GROP在真实世界场景中移动操作器上的实际实现。补充材料见:https://sites.google.com/view/llm-grop