The reconstruction of indoor scenes from multi-view RGB images is challenging due to the coexistence of flat and texture-less regions alongside delicate and fine-grained regions. Recent methods leverage neural radiance fields aided by predicted surface normal priors to recover the scene geometry. These methods excel in producing complete and smooth results for floor and wall areas. However, they struggle to capture complex surfaces with high-frequency structures due to the inadequate neural representation and the inaccurately predicted normal priors. This work aims to reconstruct high-fidelity surfaces with fine-grained details by addressing the above limitations. To improve the capacity of the implicit representation, we propose a hybrid architecture to represent low-frequency and high-frequency regions separately. To enhance the normal priors, we introduce a simple yet effective image sharpening and denoising technique, coupled with a network that estimates the pixel-wise uncertainty of the predicted surface normal vectors. Identifying such uncertainty can prevent our model from being misled by unreliable surface normal supervisions that hinder the accurate reconstruction of intricate geometries. Experiments on the benchmark datasets show that our method outperforms existing methods in terms of reconstruction quality. Furthermore, the proposed method also generalizes well to real-world indoor scenarios captured by our hand-held mobile phones. Our code is publicly available at: https://github.com/yec22/Fine-Grained-Indoor-Recon.
翻译:从多视角RGB图像重建室内场景具有挑战性,因为场景中同时存在平坦、纹理缺失的区域以及精细、细粒度的区域。现有方法通常利用神经辐射场,并借助预测的表面法向先验来恢复场景几何。这些方法在生成完整且平滑的地面和墙面区域方面表现出色。然而,由于神经表示能力不足以及预测的法向先验不准确,它们难以捕捉具有高频结构的复杂表面。本研究旨在通过解决上述局限,重建具有细粒度细节的高保真表面。为提升隐式表示的容量,我们提出一种混合架构,分别表示低频与高频区域。为增强法向先验,我们引入一种简单而有效的图像锐化与去噪技术,并结合一个估计预测表面法向量逐像素不确定性的网络。识别此类不确定性可防止模型被不可靠的表面法向监督所误导,从而避免其对复杂几何结构重建精度的影响。在基准数据集上的实验表明,本方法在重建质量方面优于现有方法。此外,所提方法也能很好地泛化至通过手持手机拍摄的真实室内场景。我们的代码已公开于:https://github.com/yec22/Fine-Grained-Indoor-Recon。