We propose MeshLRM, a novel LRM-based approach that can reconstruct a high-quality mesh from merely four input images in less than one second. Different from previous large reconstruction models (LRMs) that focus on NeRF-based reconstruction, MeshLRM incorporates differentiable mesh extraction and rendering within the LRM framework. This allows for end-to-end mesh reconstruction by fine-tuning a pre-trained NeRF LRM with mesh rendering. Moreover, we improve the LRM architecture by simplifying several complex designs in previous LRMs. MeshLRM's NeRF initialization is sequentially trained with low- and high-resolution images; this new LRM training strategy enables significantly faster convergence and thereby leads to better quality with less compute. Our approach achieves state-of-the-art mesh reconstruction from sparse-view inputs and also allows for many downstream applications, including text-to-3D and single-image-to-3D generation. Project page: https://sarahweiii.github.io/meshlrm/
翻译:我们提出MeshLRM,一种基于LRM的创新方法,能够在不到一秒的时间内仅从四张输入图像重建出高质量网格。与以往专注于基于NeRF重建的大规模重建模型不同,MeshLRM将可微分网格提取与渲染集成到LRM框架中。通过使用网格渲染对预训练的NeRF LRM进行微调,实现了端到端的网格重建。此外,我们简化了先前LRM中的若干复杂设计,改进了LRM架构。MeshLRM的NeRF初始化通过低分辨率与高分辨率图像顺序训练完成;这种新的LRM训练策略显著加速了收敛速度,从而在更少计算量下实现更优质量。我们的方法在稀疏视图输入下达到了网格重建的顶尖水平,并支持多种下游应用,包括文本到3D与单图像到3D生成。项目页面:https://sarahweiii.github.io/meshlrm/