It is essential yet challenging for future home-assistant robots to understand and manipulate diverse 3D objects in daily human environments. Towards building scalable systems that can perform diverse manipulation tasks over various 3D shapes, recent works have advocated and demonstrated promising results learning visual actionable affordance, which labels every point over the input 3D geometry with an action likelihood of accomplishing the downstream task (e.g., pushing or picking-up). However, these works only studied single-gripper manipulation tasks, yet many real-world tasks require two hands to achieve collaboratively. In this work, we propose a novel learning framework, DualAfford, to learn collaborative affordance for dual-gripper manipulation tasks. The core design of the approach is to reduce the quadratic problem for two grippers into two disentangled yet interconnected subtasks for efficient learning. Using the large-scale PartNet-Mobility and ShapeNet datasets, we set up four benchmark tasks for dual-gripper manipulation. Experiments prove the effectiveness and superiority of our method over three baselines.
翻译:对于未来的家庭辅助机器人而言,理解和操作日常人类环境中的多样化三维物体至关重要且充满挑战。为构建可对各类三维形状执行多种操作任务的可扩展系统,近期研究提出并验证了学习视觉可操作性(即对输入三维几何体上的每个点标注完成下游任务(如推拉或抓取)的动作可能性)的有效性。然而,这些研究仅针对单夹爪操作任务,而现实世界中的许多任务需要双手协同完成。本文提出一种新型学习框架DualAfford,用于学习双夹爪操作任务的协同可操作性。该方法的核心设计是将双夹爪的二次问题分解为两个解耦但相互关联的子任务,从而实现高效学习。我们利用大规模PartNet-Mobility和ShapeNet数据集,构建了四个双夹爪操作基准任务。实验证明了我们的方法相较于三种基线方法的有效性和优越性。