We present an automated technique for computing a map between two genus-zero shapes, which matches semantically corresponding regions to one another. Lack of annotated data prohibits direct inference of 3D semantic priors; instead, current State-of-the-art methods predominantly optimize geometric properties or require varying amounts of manual annotation. To overcome the lack of annotated training data, we distill semantic matches from pre-trained vision models: our method renders the pair of 3D shapes from multiple viewpoints; the resulting renders are then fed into an off-the-shelf image-matching method which leverages a pretrained visual model to produce feature points. This yields semantic correspondences, which can be projected back to the 3D shapes, producing a raw matching that is inaccurate and inconsistent between different viewpoints. These correspondences are refined and distilled into an inter-surface map by a dedicated optimization scheme, which promotes bijectivity and continuity of the output map. We illustrate that our approach can generate semantic surface-to-surface maps, eliminating manual annotations or any 3D training data requirement. Furthermore, it proves effective in scenarios with high semantic complexity, where objects are non-isometrically related, as well as in situations where they are nearly isometric.
翻译:我们提出了一种自动化技术,用于计算两个零亏格形状之间的映射,该映射能将语义对应的区域彼此匹配。标注数据的匮乏阻碍了三维语义先验的直接推断;因此,当前最先进的方法主要优化几何属性或需要不同程度的标注。为克服标注训练数据的不足,我们从预训练视觉模型中提炼语义匹配:我们的方法从多个视角渲染这对三维形状;随后,将生成的渲染图输入到一种利用预训练视觉模型产生特征点的现成图像匹配方法中。这产生了语义对应关系,这些对应关系可投影回三维形状,从而生成在不同视角间不准确且不一致的原始匹配。通过一种促进输出映射双射性和连续性的专用优化方案,这些对应关系被提炼并精炼为面间映射。我们证明了该方法能够生成语义曲面到曲面的映射,无需人工标注或任何三维训练数据。此外,在对象间存在非等距关系的高语义复杂度场景,以及近乎等距的场景中,该方法均展现出有效性。