GaMO: Geometry-aware Multi-view Diffusion Outpainting for Sparse-View 3D Reconstruction

Recent advances in 3D reconstruction have achieved remarkable progress in high-quality scene capture from dense multi-view imagery, yet struggle when input views are limited. Various approaches, including regularization techniques, semantic priors, and geometric constraints, have been implemented to address this challenge. Latest diffusion-based methods have demonstrated substantial improvements by generating novel views from new camera poses to augment training data, surpassing earlier regularization and prior-based techniques. Despite this progress, we identify three critical limitations in these state-of-the-art approaches: inadequate coverage beyond known view peripheries, geometric inconsistencies across generated views, and computationally expensive pipelines. We introduce GaMO (Geometry-aware Multi-view Outpainter), a framework that reformulates sparse-view reconstruction through multi-view outpainting. Instead of generating new viewpoints, GaMO expands the field of view from existing camera poses, which inherently preserves geometric consistency while providing broader scene coverage. Our approach employs multi-view conditioning and geometry-aware denoising strategies in a zero-shot manner without training. Extensive experiments on Replica and ScanNet++ demonstrate state-of-the-art reconstruction quality across 3, 6, and 9 input views, outperforming prior methods in PSNR and LPIPS, while achieving a $25\times$ speedup over SOTA diffusion-based methods with processing time under 10 minutes. Project page: https://yichuanh.github.io/GaMO/

翻译：近年来，三维重建领域在从密集多视角图像中实现高质量场景捕获方面取得了显著进展，但在输入视图有限时仍面临困难。为解决这一挑战，已有多种方法被提出，包括正则化技术、语义先验和几何约束。最新的基于扩散的方法通过从新相机位姿生成新颖视图以增强训练数据，已展现出实质性改进，超越了早期的正则化和基于先验的技术。尽管取得了这些进展，我们发现这些最先进方法存在三个关键局限：已知视图外围覆盖不足、生成视图间的几何不一致性以及计算流程昂贵。我们提出了GaMO（几何感知多视角外绘器），这是一个通过多视角外绘重新定义稀疏视图重建的框架。GaMO不生成新视点，而是从现有相机位姿扩展视野，这本质上保持了几何一致性，同时提供了更广泛的场景覆盖。我们的方法以零样本方式采用多视角条件化和几何感知去噪策略，无需训练。在Replica和ScanNet++数据集上的大量实验表明，在3、6和9个输入视图下均实现了最先进的重建质量，在PSNR和LPIPS指标上优于先前方法，同时相比最先进的基于扩散的方法实现了25倍加速，处理时间低于10分钟。项目页面：https://yichuanh.github.io/GaMO/