Clothes manipulation is a critical skill for household robots. Recent advancements have been made in task-specific clothes manipulation, such as folding, flattening, and hanging. However, due to clothes' complex geometries and deformability, creating a general-purpose robot system that can manipulate a diverse range of clothes in many ways remains challenging. Since clothes are typically designed with specific structures, we propose identifying these specific features like ``left sleeve'' as semantic keypoints. Semantic keypoints can provide semantic cues for task planning and geometric cues for low-level action generation. With this insight, we develop a hierarchical learning framework using the large language model (LLM) for general-purpose CLothes mAnipulation with Semantic keyPoints (CLASP). Extensive simulation experiments show that CLASP outperforms baseline methods on both seen and unseen tasks across various clothes manipulation tasks. Real-world experiments show that CLASP can be directly deployed in the real world and applied to a wide variety of clothes.
翻译:衣物操作是家用机器人的关键技能。近年来,在特定任务衣物操作(如折叠、铺平和悬挂)方面已取得进展。然而,由于衣物几何结构复杂且易变形,构建一个能以多种方式操作各类衣物的通用机器人系统仍具挑战性。鉴于衣物通常具有特定的设计结构,我们提出将诸如"左袖"等特定特征识别为语义关键点。语义关键点可为任务规划提供语义线索,并为底层动作生成提供几何线索。基于此洞见,我们开发了一种利用大语言模型的分层学习框架,用于基于语义关键点的通用衣物操作。大量仿真实验表明,在各种衣物操作任务中,CLASP在已见和未见任务上的表现均优于基线方法。真实世界实验表明,CLASP可直接部署于现实环境,并适用于多种衣物。