Understanding and manipulating deformable objects (e.g., ropes and fabrics) is an essential yet challenging task with broad applications. Difficulties come from complex states and dynamics, diverse configurations and high-dimensional action space of deformable objects. Besides, the manipulation tasks usually require multiple steps to accomplish, and greedy policies may easily lead to local optimal states. Existing studies usually tackle this problem using reinforcement learning or imitating expert demonstrations, with limitations in modeling complex states or requiring hand-crafted expert policies. In this paper, we study deformable object manipulation using dense visual affordance, with generalization towards diverse states, and propose a novel kind of foresightful dense affordance, which avoids local optima by estimating states' values for long-term manipulation. We propose a framework for learning this representation, with novel designs such as multi-stage stable learning and efficient self-supervised data collection without experts. Experiments demonstrate the superiority of our proposed foresightful dense affordance. Project page: https://hyperplane-lab.github.io/DeformableAffordance
翻译:理解和操作可变形物体(如绳索和织物)是一项基础但具有挑战性的任务,具有广泛的应用前景。难点在于可变形物体的复杂状态与动力学、多样化的构型以及高维动作空间。此外,此类操作任务通常需要多步完成,而贪心策略易导致局部最优状态。现有研究通常采用强化学习或模仿专家示范来解决这一问题,但存在建模复杂状态受限或需要手工设计专家策略的局限。本文利用密集视觉可供性研究可变形物体操作,并使其能够泛化至不同状态,提出一种新型的前瞻性密集可供性,通过估计状态在长期操作中的价值来避免局部最优。我们提出一个学习该表示的框架,包含多阶段稳定学习和无需专家的高效自监督数据收集等创新设计。实验证明了所提出的前瞻性密集可供性的优越性。项目页面:https://hyperplane-lab.github.io/DeformableAffordance