2L3: Lifting Imperfect Generated 2D Images into Accurate 3D

Reconstructing 3D objects from a single image is an intriguing but challenging problem. One promising solution is to utilize multi-view (MV) 3D reconstruction to fuse generated MV images into consistent 3D objects. However, the generated images usually suffer from inconsistent lighting, misaligned geometry, and sparse views, leading to poor reconstruction quality. To cope with these problems, we present a novel 3D reconstruction framework that leverages intrinsic decomposition guidance, transient-mono prior guidance, and view augmentation to cope with the three issues, respectively. Specifically, we first leverage to decouple the shading information from the generated images to reduce the impact of inconsistent lighting; then, we introduce mono prior with view-dependent transient encoding to enhance the reconstructed normal; and finally, we design a view augmentation fusion strategy that minimizes pixel-level loss in generated sparse views and semantic loss in augmented random views, resulting in view-consistent geometry and detailed textures. Our approach, therefore, enables the integration of a pre-trained MV image generator and a neural network-based volumetric signed distance function (SDF) representation for a single image to 3D object reconstruction. We evaluate our framework on various datasets and demonstrate its superior performance in both quantitative and qualitative assessments, signifying a significant advancement in 3D object reconstruction. Compared with the latest state-of-the-art method Syncdreamer~\cite{liu2023syncdreamer}, we reduce the Chamfer Distance error by about 36\% and improve PSNR by about 30\% .

翻译：从单张图像重建三维物体是一个有趣但具有挑战性的问题。一种有前景的解决方案是利用多视图三维重建，将生成的多个视图图像融合为一致的三维物体。然而，生成的图像通常存在光照不一致、几何对齐错误以及视图稀疏等问题，导致重建质量不佳。为应对这些问题，我们提出了一种新颖的三维重建框架，分别利用内在分解引导、瞬态单目先验引导和视图增强技术来解决这三个问题。具体而言，我们首先利用生成图像中的明暗信息解耦以减少不一致光照的影响；接着，引入带视图依赖瞬态编码的单目先验来增强法线重建；最后，设计了一种视图增强融合策略，该策略最小化生成稀疏视图中的像素级损失和增强随机视图中的语义损失，从而产生视图一致的几何结构和精细纹理。因此，我们的方法能够将预训练的多视图图像生成器与基于神经网络的体积符号距离函数（SDF）表示相结合，实现从单张图像到三维物体的重建。我们在多个数据集上评估了框架，并在定量和定性评估中展示了其卓越性能，标志着三维物体重建领域的显著进步。与最新最先进方法Syncdreamer~\cite{liu2023syncdreamer}相比，我们将Chamfer距离误差减少了约36%，并将PSNR提高了约30%。