Open-Vocabulary 3D object affordance grounding aims to anticipate ``action possibilities'' regions on 3D objects with arbitrary instructions, which is crucial for robots to generically perceive real scenarios and respond to operational changes. Existing methods focus on combining images or languages that depict interactions with 3D geometries to introduce external interaction priors. However, they are still vulnerable to a limited semantic space by failing to leverage implied invariant geometries and potential interaction intentions. Normally, humans address complex tasks through multi-step reasoning and respond to diverse situations by leveraging associative and analogical thinking. In light of this, we propose GREAT (GeometRy-intEntion collAboraTive inference) for Open-Vocabulary 3D Object Affordance Grounding, a novel framework that mines the object invariant geometry attributes and performs analogically reason in potential interaction scenarios to form affordance knowledge, fully combining the knowledge with both geometries and visual contents to ground 3D object affordance. Besides, we introduce the Point Image Affordance Dataset v2 (PIADv2), the largest 3D object affordance dataset at present to support the task. Extensive experiments demonstrate the effectiveness and superiority of GREAT. Code and dataset are available at project.
翻译:开放词汇3D物体可供性接地的目标是通过任意指令预测3D物体上的“行动可能性”区域,这对于机器人通用地感知真实场景并响应操作变化至关重要。现有方法侧重于结合描绘与3D几何体交互的图像或语言来引入外部交互先验。然而,由于未能利用隐含的不变几何特征与潜在的交互意图,这些方法仍受限于狭窄的语义空间。通常,人类通过多步推理处理复杂任务,并利用联想与类比思维应对多样化情境。鉴于此,我们提出GREAT(GeometRy-intEntion collAboraTive inference)用于开放词汇3D物体可供性接地——这是一个新颖的框架,通过挖掘物体的不变几何属性,在潜在交互场景中进行类比推理以形成可供性知识,并将该知识与几何特征及视觉内容充分结合,从而实现3D物体可供性接地。此外,我们引入了当前最大的3D物体可供性数据集Point Image Affordance Dataset v2(PIADv2)以支持本任务。大量实验证明了GREAT的有效性与优越性。代码与数据集详见项目页面。