Finding correspondences is a fundamental and extensively researched problem in computer vision and graphics. In this work, we examine the underexplored task of estimating segmentation-to-segmentation correspondence between images in the wild and untextured 3D shapes. This task is highly challenging due to substantial differences in appearance, geometry, and viewpoint. Our approach bridges the cross-modality gap by linking pixels in the image segment to vertices in the corresponding semantic part of the 3D shape. To achieve this, we first distill deep visual features from a 2D vision model onto the 3D shape surface, allowing for the computation of feature similarity between image pixels and shape vertices. Then, we identify Best Segmentation Buddies, vertices whose most similar image pixel lies within the image segmentation region, enabling the reliable discovery of vertices in semantically corresponding shape parts. Finally, we leverage distilled 3D features from the 2D image segmentation model to segment the shape directly in 3D, bootstrapping the correspondence process. We demonstrate the generality and robustness of our approach across a wide range of image-shape pairs, showcasing accurate and semantically meaningful correspondences. Our project page is at https://threedle.github.io/bsb/.
翻译:寻找对应关系是计算机视觉与图形学中一个基础且被广泛研究的问题。本文探讨了一项尚未充分探索的任务:在自然图像与无纹理三维形状之间估计分割区域到分割区域的对应关系。由于外观、几何形态和视角存在显著差异,该任务极具挑战性。我们的方法通过将图像分割中的像素与三维形状对应语义部分的顶点相连接,从而跨越跨模态鸿沟。具体而言,我们首先从二维视觉模型中提取深度视觉特征并映射到三维形状表面,以实现图像像素与形状顶点之间的特征相似度计算。随后,我们识别"最佳分割伙伴",即那些最相似图像像素落在图像分割区域内的顶点,从而可靠地发现语义对应形状部分的顶点。最后,我们利用从二维图像分割模型中蒸馏出的三维特征,直接在三维空间中对形状进行分割,以引导对应关系的建立。我们在大量图像-形状对中展示了方法的通用性与鲁棒性,获得了准确且语义上有意义的对应结果。项目页面为 https://threedle.github.io/bsb/。