Affordance detection presents intricate challenges and has a wide range of robotic applications. Previous works have faced limitations such as the complexities of 3D object shapes, the wide range of potential affordances on real-world objects, and the lack of open-vocabulary support for affordance understanding. In this paper, we introduce a new open-vocabulary affordance detection method in 3D point clouds, leveraging knowledge distillation and text-point correlation. Our approach employs pre-trained 3D models through knowledge distillation to enhance feature extraction and semantic understanding in 3D point clouds. We further introduce a new text-point correlation method to learn the semantic links between point cloud features and open-vocabulary labels. The intensive experiments show that our approach outperforms previous works and adapts to new affordance labels and unseen objects. Notably, our method achieves the improvement of 7.96% mIOU score compared to the baselines. Furthermore, it offers real-time inference which is well-suitable for robotic manipulation applications.
翻译:功能性检测任务兼具复杂性与广泛机器人应用前景。现有工作面临三维物体形状复杂性、现实物体潜在功能多样性以及缺乏开放词汇功能理解支持等多重限制。本文提出一种基于知识蒸馏与文本-点云关联的新型三维点云开放词汇功能检测方法。该方法通过知识蒸馏技术利用预训练三维模型增强点云特征提取与语义理解,同时引入新的文本-点云关联机制学习点云特征与开放词汇标签间的语义映射。大量实验表明,本方法性能超越现有技术,可适应新型功能标签与未知物体。值得注意的是,本方法在mIOU指标上较基线提升7.96%,并具备实时推理能力,特别适用于机器人操作应用场景。