In this paper, we explore a novel framework, EGIInet (Explicitly Guided Information Interaction Network), a model for View-guided Point cloud Completion (ViPC) task, which aims to restore a complete point cloud from a partial one with a single view image. In comparison with previous methods that relied on the global semantics of input images, EGIInet efficiently combines the information from two modalities by leveraging the geometric nature of the completion task. Specifically, we propose an explicitly guided information interaction strategy supported by modal alignment for point cloud completion. First, in contrast to previous methods which simply use 2D and 3D backbones to encode features respectively, we unified the encoding process to promote modal alignment. Second, we propose a novel explicitly guided information interaction strategy that could help the network identify critical information within images, thus achieving better guidance for completion. Extensive experiments demonstrate the effectiveness of our framework, and we achieved a new state-of-the-art (+16% CD over XMFnet) in benchmark datasets despite using fewer parameters than the previous methods. The pre-trained model and code and are available at https://github.com/WHU-USI3DV/EGIInet.
翻译:本文提出了一种新颖的框架——EGIInet(显式引导信息交互网络),该模型用于视图引导点云补全任务,旨在通过单视角图像从局部点云恢复完整点云。相较于先前依赖输入图像全局语义的方法,EGIInet 通过利用补全任务的几何特性,有效融合了来自两种模态的信息。具体而言,我们提出了一种由模态对齐支持的显式引导信息交互策略用于点云补全。首先,与先前分别使用二维和三维骨干网络编码特征的简单方法不同,我们统一了编码过程以促进模态对齐。其次,我们提出了一种新颖的显式引导信息交互策略,可帮助网络识别图像中的关键信息,从而为补全提供更优的引导。大量实验证明了我们框架的有效性,尽管使用的参数少于先前方法,我们在基准数据集上取得了新的最优性能(在 CD 指标上超过 XMFnet 16%)。预训练模型和代码已发布于 https://github.com/WHU-USI3DV/EGIInet。