It is essential yet challenging for future home-assistant robots to understand and manipulate diverse 3D objects in daily human environments. Towards building scalable systems that can perform diverse manipulation tasks over various 3D shapes, recent works have advocated and demonstrated promising results learning visual actionable affordance, which labels every point over the input 3D geometry with an action likelihood of accomplishing the downstream task (e.g., pushing or picking-up). However, these works only studied single-gripper manipulation tasks, yet many real-world tasks require two hands to achieve collaboratively. In this work, we propose a novel learning framework, DualAfford, to learn collaborative affordance for dual-gripper manipulation tasks. The core design of the approach is to reduce the quadratic problem for two grippers into two disentangled yet interconnected subtasks for efficient learning. Using the large-scale PartNet-Mobility and ShapeNet datasets, we set up four benchmark tasks for dual-gripper manipulation. Experiments prove the effectiveness and superiority of our method over three baselines.
翻译:对于未来的家庭辅助机器人而言,理解并操作日常人类环境中各种三维物体至关重要但极具挑战性。为了构建可执行多样化三维形状操作任务的可扩展系统,近期研究倡导并展示了学习视觉可操作性的显著成果——该方法为输入三维几何体上的每个点标注完成下游任务(如推或拾取)的动作可能性。然而,这些工作仅研究了单夹爪操作任务,而现实世界中许多任务需要双手协同完成。本文提出一种新型学习框架DualAfford,用于学习双夹爪操作任务的协同可操作性。该方法的核心设计是将双夹爪的二次问题分解为两个解耦但互联的子任务,以提升学习效率。基于大规模PartNet-Mobility和ShapeNet数据集,我们建立了四个双夹爪操作基准任务。实验证明,我们的方法相较于三种基线方法具有有效性和优越性。