The reconstruction of indoor scenes from multi-view RGB images is challenging due to the coexistence of flat and texture-less regions alongside delicate and fine-grained regions. Recent methods leverage neural radiance fields aided by predicted surface normal priors to recover the scene geometry. These methods excel in producing complete and smooth results for floor and wall areas. However, they struggle to capture complex surfaces with high-frequency structures due to the inadequate neural representation and the inaccurately predicted normal priors. This work aims to reconstruct high-fidelity surfaces with fine-grained details by addressing the above limitations. To improve the capacity of the implicit representation, we propose a hybrid architecture to represent low-frequency and high-frequency regions separately. To enhance the normal priors, we introduce a simple yet effective image sharpening and denoising technique, coupled with a network that estimates the pixel-wise uncertainty of the predicted surface normal vectors. Identifying such uncertainty can prevent our model from being misled by unreliable surface normal supervisions that hinder the accurate reconstruction of intricate geometries. Experiments on the benchmark datasets show that our method outperforms existing methods in terms of reconstruction quality. Furthermore, the proposed method also generalizes well to real-world indoor scenarios captured by our hand-held mobile phones. Our code is publicly available at: https://github.com/yec22/Fine-Grained-Indoor-Recon.
翻译:从多视角RGB图像重建室内场景面临挑战,因平坦无纹理区域与精细复杂区域并存。现有方法借助预测的表面法线先验优化神经辐射场以恢复场景几何,虽能生成地板、墙面等区域的完整平滑结果,但因神经表示能力不足及法线先验预测不准确,难以捕捉具有高频结构的复杂表面。本研究针对上述局限,旨在重建具有精细细节的高保真表面。为提升隐式表示能力,我们提出混合架构分别表征低频与高频区域;为增强法线先验,引入简单有效的图像锐化与去噪技术,并借助网络预测表面法线向量的逐像素不确定性。识别此类不确定性可防止模型被不可靠的表面法线监督误导,从而避免阻碍复杂几何结构的精确重建。基准数据集实验表明,本方法在重建质量上优于现有方法。此外,所提方法对手机拍摄的真实室内场景亦具良好泛化性。代码开源地址:https://github.com/yec22/Fine-Grained-Indoor-Recon。