It is essential yet challenging for future home-assistant robots to understand and manipulate diverse 3D objects in daily human environments. Towards building scalable systems that can perform diverse manipulation tasks over various 3D shapes, recent works have advocated and demonstrated promising results learning visual actionable affordance, which labels every point over the input 3D geometry with an action likelihood of accomplishing the downstream task (e.g., pushing or picking-up). However, these works only studied single-gripper manipulation tasks, yet many real-world tasks require two hands to achieve collaboratively. In this work, we propose a novel learning framework, DualAfford, to learn collaborative affordance for dual-gripper manipulation tasks. The core design of the approach is to reduce the quadratic problem for two grippers into two disentangled yet interconnected subtasks for efficient learning. Using the large-scale PartNet-Mobility and ShapeNet datasets, we set up four benchmark tasks for dual-gripper manipulation. Experiments prove the effectiveness and superiority of our method over three baselines.
翻译:在人类日常环境中,未来家庭辅助机器人理解并操作多样化的三维物体既至关重要又充满挑战。为构建可对各类三维形状执行多样化操作任务的规模化系统,近年研究提出并验证了学习视觉可操作性的显著成效——该方法通过为输入三维几何的每个点标注完成下游任务(如推拉或抓取)的动作可能性。然而,现有工作仅研究单夹爪操作任务,而实际场景中许多任务需要双手协作完成。本文提出新型学习框架DualAfford,用于学习双夹爪操作任务的协作可操作性。该方法的核心设计是将双夹爪的二次型问题分解为两个解耦且相互关联的子任务,从而实现高效学习。基于大规模PartNet-Mobility和ShapeNet数据集,我们构建了四项双夹爪操作基准任务。实验证明,相较于三种基线方法,本方法具有显著有效性与优越性。