Despite progress in stereo depth estimation, omnidirectional imaging remains underexplored, mainly due to the lack of appropriate data. We introduce Helvipad, a real-world dataset for omnidirectional stereo depth estimation, featuring 40K video frames from video sequences across diverse environments, including crowded indoor and outdoor scenes with various lighting conditions. Collected using two 360{\deg} cameras in a top-bottom setup and a LiDAR sensor, the dataset includes accurate depth and disparity labels by projecting 3D point clouds onto equirectangular images. Additionally, we provide an augmented training set with an increased label density by using depth completion. We benchmark leading stereo depth estimation models for both standard and omnidirectional images. The results show that while recent stereo methods perform decently, a challenge persists in accurately estimating depth in omnidirectional imaging. To address this, we introduce necessary adaptations to stereo models, leading to improved performance.
翻译:尽管立体深度估计已取得进展,全向成像领域仍未被充分探索,这主要源于缺乏合适的数据。我们推出了Helvipad,一个用于全向立体深度估计的真实世界数据集,其包含来自不同环境(包括具有各种光照条件的拥挤室内外场景)视频序列的4万帧视频画面。该数据集使用两台以顶-底方式设置的360°相机和一个LiDAR传感器采集,通过将3D点云投影到等距柱状图像上,提供了精确的深度和视差标签。此外,我们通过深度补全技术提供了一个标签密度更高的增强训练集。我们对领先的立体深度估计模型在标准图像和全向图像上进行了基准测试。结果表明,尽管最新的立体方法表现尚可,但在全向成像中精确估计深度仍存在挑战。为解决此问题,我们引入了对立体模型的必要适配,从而提升了性能。