Multiple near frontal-parallel planes based depth representation demonstrated impressive results in self-supervised monocular depth estimation (MDE). Whereas, such a representation would cause the discontinuity of the ground as it is perpendicular to the frontal-parallel planes, which is detrimental to the identification of drivable space in autonomous driving. In this paper, we propose the PlaneDepth, a novel orthogonal planes based presentation, including vertical planes and ground planes. PlaneDepth estimates the depth distribution using a Laplacian Mixture Model based on orthogonal planes for an input image. These planes are used to synthesize a reference view to provide the self-supervision signal. Further, we find that the widely used resizing and cropping data augmentation breaks the orthogonality assumptions, leading to inferior plane predictions. We address this problem by explicitly constructing the resizing cropping transformation to rectify the predefined planes and predicted camera pose. Moreover, we propose an augmented self-distillation loss supervised with a bilateral occlusion mask to boost the robustness of orthogonal planes representation for occlusions. Thanks to our orthogonal planes representation, we can extract the ground plane in an unsupervised manner, which is important for autonomous driving. Extensive experiments on the KITTI dataset demonstrate the effectiveness and efficiency of our method. The code is available at https://github.com/svip-lab/PlaneDepth.
翻译:基于多个近前平行平面的深度表示方法在自监督单目深度估计(MDE)中展现了显著效果。然而,这种表示会导致地面不连续(因其与这些前平行平面正交),从而影响自动驾驶中可行驶区域的识别。本文提出PlaneDepth——一种基于正交平面的新型表示方法,包含垂直平面与地面平面。PlaneDepth利用基于正交平面的拉普拉斯混合模型,为输入图像估计深度分布,并通过这些平面合成参考视图以提供自监督信号。进一步地,我们发现广泛使用的缩放与裁剪数据增强会破坏正交性假设,导致平面预测性能下降。通过显式构建缩放-裁剪变换来修正预定义平面与预测相机位姿,我们解决了该问题。此外,我们提出一种增强型自蒸馏损失函数,并辅以双边遮挡掩模监督,以提升正交平面表示的遮挡鲁棒性。得益于正交平面表示,我们能够以无监督方式提取地面平面,这对自动驾驶至关重要。在KITTI数据集上的大量实验证明了我们方法的有效性与高效性。代码已开源在https://github.com/svip-lab/PlaneDepth。