Accurate 3D shape abstraction from a single 2D image is a long-standing problem in computer vision and graphics. By leveraging a set of primitives to represent the target shape, recent methods have achieved promising results. However, these methods either use a relatively large number of primitives or lack geometric flexibility due to the limited expressibility of the primitives. In this paper, we propose a novel bi-channel Transformer architecture, integrated with parameterized deformable models, termed DeFormer, to simultaneously estimate the global and local deformations of primitives. In this way, DeFormer can abstract complex object shapes while using a small number of primitives which offer a broader geometry coverage and finer details. Then, we introduce a force-driven dynamic fitting and a cycle-consistent re-projection loss to optimize the primitive parameters. Extensive experiments on ShapeNet across various settings show that DeFormer achieves better reconstruction accuracy over the state-of-the-art, and visualizes with consistent semantic correspondences for improved interpretability.
翻译:摘要:从单张二维图像中精确提取三维形状是计算机视觉与图形学领域的长期难题。通过利用基元集合表示目标形状,近期方法已取得显著进展。然而,这些方法要么使用数量较多的基元,要么因基元表达能力受限而缺乏几何灵活性。本文提出一种新颖的双通道Transformer架构,其与参数化可变形模型相集成,命名为DeFormer,可同步估计基元的全局与局部形变。通过这种方式,DeFormer能以较少的基元数量抽象复杂物体形状,同时提供更广的几何覆盖范围与更精细的细节。继而,我们引入力驱动动态拟合与循环一致性重投影损失函数以优化基元参数。在ShapeNet上跨多种设置的广泛实验表明,DeFormer相较于现有最优方法实现了更优的重建精度,并通过一致语义对应可视化增强了模型可解释性。