Large Reconstruction Models have made significant strides in the realm of automated 3D content generation from single or multiple input images. Despite their success, these models often produce 3D meshes with geometric inaccuracies, stemming from the inherent challenges of deducing 3D shapes solely from image data. In this work, we introduce a novel framework, the Large Image and Point Cloud Alignment Model (LAM3D), which utilizes 3D point cloud data to enhance the fidelity of generated 3D meshes. Our methodology begins with the development of a point-cloud-based network that effectively generates precise and meaningful latent tri-planes, laying the groundwork for accurate 3D mesh reconstruction. Building upon this, our Image-Point-Cloud Feature Alignment technique processes a single input image, aligning to the latent tri-planes to imbue image features with robust 3D information. This process not only enriches the image features but also facilitates the production of high-fidelity 3D meshes without the need for multi-view input, significantly reducing geometric distortions. Our approach achieves state-of-the-art high-fidelity 3D mesh reconstruction from a single image in just 6 seconds, and experiments on various datasets demonstrate its effectiveness.
翻译:大规模重建模型在从单张或多张输入图像自动生成三维内容领域已取得显著进展。尽管取得了成功,这些模型生成的三维网格常常存在几何不准确的问题,这源于仅从图像数据推断三维形状所固有的挑战。在本工作中,我们提出了一种新颖的框架——大规模图像与点云对齐模型(LAM3D),该框架利用三维点云数据来提升生成三维网格的保真度。我们的方法首先开发了一个基于点云的网络,该网络能有效生成精确且富有意义的潜在三平面,为准确的三维网格重建奠定基础。在此基础上,我们的图像-点云特征对齐技术处理单张输入图像,将其与潜在三平面对齐,从而为图像特征注入鲁棒的三维信息。这一过程不仅丰富了图像特征,还促进了高保真三维网格的生成,且无需多视角输入,显著减少了几何失真。我们的方法仅需6秒即可实现从单张图像进行最先进的高保真三维网格重建,在多个数据集上的实验证明了其有效性。