Identifying affordance regions on 3D objects from semantic cues is essential for robotics and human-machine interaction. However, existing 3D affordance learning methods struggle with generalization and robustness due to limited annotated data and a reliance on 3D backbones focused on geometric encoding, which often lack resilience to real-world noise and data corruption. We propose GEAL, a novel framework designed to enhance the generalization and robustness of 3D affordance learning by leveraging large-scale pre-trained 2D models. We employ a dual-branch architecture with Gaussian splatting to establish consistent mappings between 3D point clouds and 2D representations, enabling realistic 2D renderings from sparse point clouds. A granularity-adaptive fusion module and a 2D-3D consistency alignment module further strengthen cross-modal alignment and knowledge transfer, allowing the 3D branch to benefit from the rich semantics and generalization capacity of 2D models. To holistically assess the robustness, we introduce two new corruption-based benchmarks: PIAD-C and LASO-C. Extensive experiments on public datasets and our benchmarks show that GEAL consistently outperforms existing methods across seen and novel object categories, as well as corrupted data, demonstrating robust and adaptable affordance prediction under diverse conditions. Code and corruption datasets have been made publicly available.
翻译:从语义线索中识别三维物体的功能可供性区域对于机器人学和人机交互至关重要。然而,现有的三维功能可供性学习方法由于标注数据有限,且依赖专注于几何编码的三维主干网络,往往对现实世界噪声和数据损坏缺乏鲁棒性,因此在泛化能力和鲁棒性方面存在不足。我们提出GEAL,一个新颖的框架,旨在通过利用大规模预训练的二维模型来增强三维功能可供性学习的泛化能力和鲁棒性。我们采用具有高斯溅射的双分支架构,以建立三维点云与二维表示之间的一致性映射,从而能够从稀疏点云生成逼真的二维渲染。一个粒度自适应融合模块和一个二维-三维一致性对齐模块进一步强化了跨模态对齐和知识迁移,使得三维分支能够受益于二维模型丰富的语义和泛化能力。为了全面评估鲁棒性,我们引入了两个新的基于数据损坏的基准测试:PIAD-C和LASO-C。在公共数据集及我们提出的基准测试上进行的大量实验表明,GEAL在已见和新颖的物体类别以及损坏数据上均持续优于现有方法,证明了其在多样条件下进行鲁棒且适应性强的功能可供性预测的能力。代码和损坏数据集已公开提供。