Novel view synthesis and 3D modeling using implicit neural field representation are shown to be very effective for calibrated multi-view cameras. Such representations are known to benefit from additional geometric and semantic supervision. Most existing methods that exploit additional supervision require dense pixel-wise labels or localized scene priors. These methods cannot benefit from high-level vague scene priors provided in terms of scenes' descriptions. In this work, we aim to leverage the geometric prior of Manhattan scenes to improve the implicit neural radiance field representations. More precisely, we assume that only the knowledge of the indoor scene (under investigation) being Manhattan is known -- with no additional information whatsoever -- with an unknown Manhattan coordinate frame. Such high-level prior is used to self-supervise the surface normals derived explicitly in the implicit neural fields. Our modeling allows us to cluster the derived normals and exploit their orthogonality constraints for self-supervision. Our exhaustive experiments on datasets of diverse indoor scenes demonstrate the significant benefit of the proposed method over the established baselines. The source code will be available at https://github.com/nikola3794/normal-clustering-nerf.
翻译:使用隐式神经场表示进行新颖视图合成和三维建模已被证明对已标定的多视图相机非常有效。此类表示已知受益于额外的几何和语义监督。大多数利用额外监督的现有方法需要密集的像素级标签或局部场景先验,这些方法无法利用通过场景描述提供的高层次模糊场景先验。本研究旨在利用曼哈顿场景的几何先验来改进隐式神经辐射场表示。具体而言,我们假设仅知道所研究的室内场景属于曼哈顿类型——且无任何额外信息——且曼哈顿坐标系未知。此类高层次先验被用于对隐式神经场中显式推导的表面法线进行自监督。我们的建模能够对推导出的法线进行聚类,并利用其正交性约束实现自监督。在多种室内场景数据集上的详尽实验表明,所提出的方法相较于现有基准具有显著优势。源代码将发布于 https://github.com/nikola3794/normal-clustering-nerf。