Omnidirectional multi-view stereo (MVS) vision is attractive for its ultra-wide field-of-view (FoV), enabling machines to perceive 360{\deg} 3D surroundings. However, the existing solutions require expensive dense depth labels for supervision, making them impractical in real-world applications. In this paper, we propose the first unsupervised omnidirectional MVS framework based on multiple fisheye images. To this end, we project all images to a virtual view center and composite two panoramic images with spherical geometry from two pairs of back-to-back fisheye images. The two 360{\deg} images formulate a stereo pair with a special pose, and the photometric consistency is leveraged to establish the unsupervised constraint, which we term "Pseudo-Stereo Supervision". In addition, we propose Un-OmniMVS, an efficient unsupervised omnidirectional MVS network, to facilitate the inference speed with two efficient components. First, a novel feature extractor with frequency attention is proposed to simultaneously capture the non-local Fourier features and local spatial features, explicitly facilitating the feature representation. Then, a variance-based light cost volume is put forward to reduce the computational complexity. Experiments exhibit that the performance of our unsupervised solution is competitive to that of the state-of-the-art (SoTA) supervised methods with better generalization in real-world data.
翻译:全向多视图立体视觉因其超宽视场角而备受关注,能够使机器感知360°三维环境。然而,现有解决方案需要昂贵的密集深度标签进行监督,使其在实际应用中不可行。本文提出首个基于多张鱼眼图像的无监督全向MVS框架。为此,我们将所有图像投影至虚拟视图中心,并通过两对背对背鱼眼图像合成两幅具有球面几何的全景图像。这两幅360°图像构成一个具有特殊位姿的立体对,利用光度一致性建立无监督约束,我们称之为"伪立体监督"。此外,我们提出Un-OmniMVS——一种高效的无监督全向MVS网络,通过两个高效组件提升推理速度。首先,提出带有频率注意力的新型特征提取器,同时捕获非局部傅里叶特征与局部空间特征,显式增强特征表示。然后,提出基于方差的轻量代价体以降低计算复杂度。实验表明,我们的无监督解决方案的性能与现有最先进的有监督方法相当,且在真实数据上具有更好的泛化能力。