The reconstruction of indoor scenes from multi-view RGB images is challenging due to the coexistence of flat and texture-less regions alongside delicate and fine-grained regions. Recent methods leverage neural radiance fields aided by predicted surface normal priors to recover the scene geometry. These methods excel in producing complete and smooth results for floor and wall areas. However, they struggle to capture complex surfaces with high-frequency structures due to the inadequate neural representation and the inaccurately predicted normal priors. To improve the capacity of the implicit representation, we propose a hybrid architecture to represent low-frequency and high-frequency regions separately. To enhance the normal priors, we introduce a simple yet effective image sharpening and denoising technique, coupled with a network that estimates the pixel-wise uncertainty of the predicted surface normal vectors. Identifying such uncertainty can prevent our model from being misled by unreliable surface normal supervisions that hinder the accurate reconstruction of intricate geometries. Experiments on the benchmark datasets show that our method significantly outperforms existing methods in terms of reconstruction quality.
翻译:从多视角RGB图像重建室内场景面临平坦无纹理区域与精细复杂区域共存的挑战。现有方法借助预测的表面法向先验增强神经辐射场,可恢复场景几何结构,在生成地板和墙面区域的完整平滑结果方面表现优异。然而,由于神经表示能力不足以及法向先验预测不准确,此类方法难以捕捉具有高频结构的复杂表面。为提升隐式表示的建模能力,本文提出一种混合架构分别表示低频与高频区域。针对法向先验增强,我们引入简洁高效的图像锐化去噪技术,并设计网络估算预测表面法向矢量的逐像素不确定性。识别此类不确定性可避免模型被不可靠的表面法向监督误导,从而保证复杂几何结构的精确重建。在基准数据集上的实验表明,本方法在重建质量上显著优于现有方法。