The absolute depth values of surrounding environments provide crucial cues for various assistive technologies, such as localization, navigation, and 3D structure estimation. We propose that accurate depth estimated from panoramic images can serve as a powerful and light-weight input for a wide range of downstream tasks requiring 3D information. While panoramic images can easily capture the surrounding context from commodity devices, the estimated depth shares the limitations of conventional image-based depth estimation; the performance deteriorates under large domain shifts and the absolute values are still ambiguous to infer from 2D observations. By taking advantage of the holistic view, we mitigate such effects in a self-supervised way and fine-tune the network with geometric consistency during the test phase. Specifically, we construct a 3D point cloud from the current depth prediction and project the point cloud at various viewpoints or apply stretches on the current input image to generate synthetic panoramas. Then we minimize the discrepancy of the 3D structure estimated from synthetic images without collecting additional data. We empirically evaluate our method in robot navigation and map-free localization where our method shows large performance enhancements. Our calibration method can therefore widen the applicability under various external conditions, serving as a key component for practical panorama-based machine vision systems. Code is available through the following link: \url{https://github.com/82magnolia/panoramic-depth-calibration}.
翻译:摘要:周围环境的绝对深度值为多种辅助技术(如定位、导航和三维结构估计)提供了关键线索。我们提出,从全景图像估计出的精确深度可作为轻量级且强大的输入,服务于各类需三维信息的下游任务。尽管商用设备能通过全景图像轻松捕获周围场景,但估计出的深度仍面临传统基于图像深度估计的局限性:在大域偏移下性能会显著下降,且从二维观测中推断绝对深度值仍存在歧义。通过利用全景视角的全局特性,我们以自监督方式缓解此类影响,并在测试阶段通过几何一致性对网络进行微调。具体而言,我们从当前深度预测中构建三维点云,在不同视点投影该点云或对当前输入图像进行拉伸以生成合成全景图。随后,我们无需收集额外数据即可最小化从合成图像中估计的三维结构差异。通过在机器人导航和无图定位任务中的实证评估,我们的方法展现了显著的性能提升。因此,所提校准方法可在多种外部条件下拓宽应用范围,成为实用化全景机器视觉系统的关键组件。代码可通过以下链接获取:\url{https://github.com/82magnolia/panoramic-depth-calibration}。