Novel view synthesis and 3D modeling using implicit neural field representation are shown to be very effective for calibrated multi-view cameras. Such representations are known to benefit from additional geometric and semantic supervision. Most existing methods that exploit additional supervision require dense pixel-wise labels or localized scene priors. These methods cannot benefit from high-level vague scene priors provided in terms of scenes' descriptions. In this work, we aim to leverage the geometric prior of Manhattan scenes to improve the implicit neural radiance field representations. More precisely, we assume that only the knowledge of the indoor scene (under investigation) being Manhattan is known -- with no additional information whatsoever -- with an unknown Manhattan coordinate frame. Such high-level prior is used to self-supervise the surface normals derived explicitly in the implicit neural fields. Our modeling allows us to cluster the derived normals and exploit their orthogonality constraints for self-supervision. Our exhaustive experiments on datasets of diverse indoor scenes demonstrate the significant benefit of the proposed method over the established baselines. The source code is available at https://github.com/nikola3794/normal-clustering-nerf.
翻译:使用隐式神经场表示的新型视角合成和三维建模方法,在已标定多视角相机中展现出显著成效。这类表示方法已被证实能从额外的几何与语义监督中获益。现有利用额外监督的方法大多需要密集的像素级标签或局部场景先验,无法利用以场景描述形式提供的高层模糊场景先验。本研究旨在利用曼哈顿场景的几何先验改进隐式神经辐射场表示。具体而言,我们仅假设目标室内场景具有"曼哈顿"属性(无任何额外信息),且未知曼哈顿坐标框架。该高层先验被用于自监督隐式神经场中显式推导的表面法线。我们的建模方法能够对推导出的法线进行聚类,并利用其正交性约束实现自监督。在多个室内场景数据集上的详尽实验表明,所提方法相较于现有基准方法具有显著优势。源代码已开源至 https://github.com/nikola3794/normal-clustering-nerf。