The majority of point cloud registration methods currently rely on extracting features from points. However, these methods are limited by their dependence on information obtained from a single modality of points, which can result in deficiencies such as inadequate perception of global features and a lack of texture information. Actually, humans can employ visual information learned from 2D images to comprehend the 3D world. Based on this fact, we present a novel Cross-Modal Information-Guided Network (CMIGNet), which obtains global shape perception through cross-modal information to achieve precise and robust point cloud registration. Specifically, we first incorporate the projected images from the point clouds and fuse the cross-modal features using the attention mechanism. Furthermore, we employ two contrastive learning strategies, namely overlapping contrastive learning and cross-modal contrastive learning. The former focuses on features in overlapping regions, while the latter emphasizes the correspondences between 2D and 3D features. Finally, we propose a mask prediction module to identify keypoints in the point clouds. Extensive experiments on several benchmark datasets demonstrate that our network achieves superior registration performance.
翻译:当前大多数点云配准方法基于从点中提取特征。然而,这些方法受限于依赖单一点模态获取的信息,可能导致全局特征感知不足、缺乏纹理信息等缺陷。实际上,人类能够利用从二维图像中习得的视觉信息理解三维世界。基于这一事实,我们提出了一种新颖的跨模态信息引导网络(CMIGNet),通过跨模态信息获取全局形状感知,实现精确且鲁棒的点云配准。具体而言,我们首先融合点云投影图像,并利用注意力机制融合跨模态特征。此外,我们采用两种对比学习策略,即重叠区域对比学习和跨模态对比学习。前者聚焦于重叠区域的特征,而后者则强调二维与三维特征间的对应关系。最后,我们提出一个掩码预测模块来识别点云中的关键点。在多个基准数据集上的大量实验证明,我们的网络实现了卓越的配准性能。