Monocular depth estimation has drawn widespread attention from the vision community due to its broad applications. In this paper, we propose a novel physics (geometry)-driven deep learning framework for monocular depth estimation by assuming that 3D scenes are constituted by piece-wise planes. Particularly, we introduce a new normal-distance head that outputs pixel-level surface normal and plane-to-origin distance for deriving depth at each position. Meanwhile, the normal and distance are regularized by a developed plane-aware consistency constraint. We further integrate an additional depth head to improve the robustness of the proposed framework. To fully exploit the strengths of these two heads, we develop an effective contrastive iterative refinement module that refines depth in a complementary manner according to the depth uncertainty. Extensive experiments indicate that the proposed method exceeds previous state-of-the-art competitors on the NYU-Depth-v2, KITTI and SUN RGB-D datasets. Notably, it ranks 1st among all submissions on the KITTI depth prediction online benchmark at the submission time.
翻译:单目深度估计因其广泛的应用而受到视觉社区的广泛关注。本文提出一种新颖的物理(几何)驱动的深度学习框架用于单目深度估计,其假设三维场景由分段平面构成。具体而言,我们引入了一个新的法线-距离头,输出逐像素的表面法线和平面到原点的距离,以推导每个位置的深度。同时,通过一种新开发的平面感知一致性约束对法线和距离进行正则化。我们进一步集成了一个额外的深度头以提高所提框架的鲁棒性。为充分利用这两个头的优势,我们开发了一个有效的对比迭代精化模块,根据深度不确定性以互补方式精化深度。大量实验表明,所提方法在NYU-Depth-v2、KITTI和SUN RGB-D数据集上超越了此前最优的竞争对手。值得注意的是,在提交时,该方法在KITTI深度预测在线基准测试中位列第一。