Zero-Shot 3D Shape Correspondence

We propose a novel zero-shot approach to computing correspondences between 3D shapes. Existing approaches mainly focus on isometric and near-isometric shape pairs (e.g., human vs. human), but less attention has been given to strongly non-isometric and inter-class shape matching (e.g., human vs. cow). To this end, we introduce a fully automatic method that exploits the exceptional reasoning capabilities of recent foundation models in language and vision to tackle difficult shape correspondence problems. Our approach comprises multiple stages. First, we classify the 3D shapes in a zero-shot manner by feeding rendered shape views to a language-vision model (e.g., BLIP2) to generate a list of class proposals per shape. These proposals are unified into a single class per shape by employing the reasoning capabilities of ChatGPT. Second, we attempt to segment the two shapes in a zero-shot manner, but in contrast to the co-segmentation problem, we do not require a mutual set of semantic regions. Instead, we propose to exploit the in-context learning capabilities of ChatGPT to generate two different sets of semantic regions for each shape and a semantic mapping between them. This enables our approach to match strongly non-isometric shapes with significant differences in geometric structure. Finally, we employ the generated semantic mapping to produce coarse correspondences that can further be refined by the functional maps framework to produce dense point-to-point maps. Our approach, despite its simplicity, produces highly plausible results in a zero-shot manner, especially between strongly non-isometric shapes. Project webpage: https://samir55.github.io/3dshapematch/.

翻译：我们提出了一种新颖的零样本方法，用于计算三维形状之间的对应关系。现有方法主要关注等距或近似等距的形状对（例如，人类与人类），而对强非等距及跨类形状匹配（例如，人类与牛）的研究较少。为此，我们引入了一种全自动方法，利用近期语言与视觉基础模型卓越的推理能力，来解决困难的形状对应问题。我们的方法包含多个阶段。首先，通过将渲染后的形状视图输入语言-视觉模型（如BLIP2），以零样本方式对三维形状进行分类，为每个形状生成一组类别候选。随后，利用ChatGPT的推理能力，将这些候选统一为每个形状的单一类别。其次，我们尝试以零样本方式对两个形状进行分割，但与协同分割问题不同，我们不需要两组形状共有的语义区域集合。相反，我们提出利用ChatGPT的上下文学习能力，为每个形状生成两组不同的语义区域，并在它们之间建立语义映射。这使得我们的方法能够匹配几何结构差异显著的强非等距形状。最后，我们利用生成的语义映射产生粗糙对应关系，并可通过函数映射框架进一步细化为稠密的点对点映射。尽管方法简单，我们的方法在零样本设置下，尤其在强非等距形状之间，生成了高度合理的结果。项目网页：https://samir55.github.io/3dshapematch/。