Pixel2Mesh (P2M) is a classical approach for reconstructing 3D shapes from a single color image through coarse-to-fine mesh deformation. Although P2M is capable of generating plausible global shapes, its Graph Convolution Network (GCN) often produces overly smooth results, causing the loss of fine-grained geometry details. Moreover, P2M generates non-credible features for occluded regions and struggles with the domain gap from synthetic data to real-world images, which is a common challenge for single-view 3D reconstruction methods. To address these challenges, we propose a novel Transformer-boosted architecture, named T-Pixel2Mesh, inspired by the coarse-to-fine approach of P2M. Specifically, we use a global Transformer to control the holistic shape and a local Transformer to progressively refine the local geometry details with graph-based point upsampling. To enhance real-world reconstruction, we present the simple yet effective Linear Scale Search (LSS), which serves as prompt tuning during the input preprocessing. Our experiments on ShapeNet demonstrate state-of-the-art performance, while results on real-world data show the generalization capability.
翻译:Pixel2Mesh (P2M) 是一种通过粗到细网格变形从单张彩色图像重建三维形状的经典方法。尽管P2M能生成合理的全局形状,但其图卷积网络(GCN)往往产生过于平滑的结果,导致细粒度几何细节丢失。此外,P2M在遮挡区域生成不可靠的特征,并难以应对从合成数据到真实图像的域差距——这是单视角三维重建方法的普遍挑战。为解决上述问题,受P2M粗到细方法的启发,我们提出了一种新型Transformer增强架构,命名为T-Pixel2Mesh。具体而言,我们采用全局Transformer控制整体形状,并通过基于图的点云上采样技术,利用局部Transformer逐步细化局部几何细节。为提升真实场景重建效果,我们提出简单高效的线性尺度搜索(LSS)方法,将其作为输入预处理阶段的提示调优机制。在ShapeNet上的实验表明本方法达到了最优性能,而真实数据上的结果则展示了其泛化能力。