Mesh reconstruction from multi-view images is a fundamental problem in computer vision, but its performance degrades significantly under sparse-view conditions, especially in unseen regions where no ground-truth observations are available. While recent advances in diffusion models have demonstrated strong capabilities in synthesizing novel views from limited inputs, their outputs often suffer from visual artifacts and lack 3D consistency, posing challenges for reliable mesh optimization. In this paper, we propose a novel framework that leverages diffusion models to enhance sparse-view mesh reconstruction in a principled and reliable manner. To address the instability of diffusion outputs, we propose a Consensus Diffusion Module that filters unreliable generations via interquartile range (IQR) analysis and performs variance-aware image fusion to produce robust pseudo-supervision. Building on this, we design an online reinforcement learning strategy based on the Upper Confidence Bound (UCB) to adaptively select the most informative viewpoints for enhancement, guided by diffusion loss. Finally, the fused images are used to jointly supervise a NeRF-based model alongside sparse-view ground truth, ensuring consistency across both geometry and appearance. Extensive experiments demonstrate that our method achieves significant improvements in both geometric quality and rendering quality.
翻译:从多视角图像重建网格是计算机视觉中的一个基础问题,但其性能在稀疏视角条件下显著下降,尤其是在缺乏真实观测的未观测区域。尽管扩散模型的最新进展已展现出从有限输入合成新视角的强大能力,但其输出常存在视觉伪影且缺乏三维一致性,这为可靠的网格优化带来了挑战。本文提出一种新颖框架,以原理性且可靠的方式利用扩散模型增强稀疏视角网格重建。针对扩散输出的不稳定性,我们提出一种共识扩散模块,该模块通过四分位距分析过滤不可靠的生成结果,并执行方差感知的图像融合以产生鲁棒的伪监督。在此基础上,我们设计了一种基于上置信界的在线强化学习策略,在扩散损失的指导下自适应地选择信息量最大的视角进行增强。最后,融合图像与稀疏视角真实数据共同监督一个基于NeRF的模型,确保几何与外观的一致性。大量实验表明,我们的方法在几何质量和渲染质量上均取得了显著提升。