The great potential of unsupervised monocular depth estimation has been demonstrated by many works due to low annotation cost and impressive accuracy comparable to supervised methods. To further improve the performance, recent works mainly focus on designing more complex network structures and exploiting extra supervised information, e.g., semantic segmentation. These methods optimize the models by exploiting the reconstructed relationship between the target and reference images in varying degrees. However, previous methods prove that this image reconstruction optimization is prone to get trapped in local minima. In this paper, our core idea is to guide the optimization with prior knowledge from pretrained Flow-Net. And we show that the bottleneck of unsupervised monocular depth estimation can be broken with our simple but effective framework named FG-Depth. In particular, we propose (i) a flow distillation loss to replace the typical photometric loss that limits the capacity of the model and (ii) a prior flow based mask to remove invalid pixels that bring the noise in training loss. Extensive experiments demonstrate the effectiveness of each component, and our approach achieves state-of-the-art results on both KITTI and NYU-Depth-v2 datasets.
翻译:无监督单目深度估计因标注成本低且精度可与有监督方法相媲美,已通过多项研究展现出巨大潜力。为进一步提升性能,近期工作主要聚焦于设计更复杂的网络结构以及利用额外监督信息(如语义分割)。这些方法通过不同程度地挖掘目标图像与参考图像之间的重建关系来优化模型。然而,先前研究证明这种图像重建优化方式易陷入局部最优解。本文的核心思想是利用预训练Flow-Net的先验知识引导优化过程,并证明我们提出的简单而高效的框架FG-Depth能够突破无监督单目深度估计的性能瓶颈。具体而言,我们提出:(i) 用光流蒸馏损失替代限制模型容量的典型光度损失;(ii) 基于先验光流的掩码机制去除训练损失中引入噪声的无效像素。大量实验验证了各模块的有效性,且我们的方法在KITTI和NYU-Depth-v2数据集上均取得了最先进的结果。