Unsupervised monocular depth estimation techniques have demonstrated encouraging results but typically assume that the scene is static. These techniques suffer when trained on dynamical scenes, where apparent object motion can equally be explained by hypothesizing the object's independent motion, or by altering its depth. This ambiguity causes depth estimators to predict erroneous depth for moving objects. To resolve this issue, we introduce Dynamo-Depth, an unifying approach that disambiguates dynamical motion by jointly learning monocular depth, 3D independent flow field, and motion segmentation from unlabeled monocular videos. Specifically, we offer our key insight that a good initial estimation of motion segmentation is sufficient for jointly learning depth and independent motion despite the fundamental underlying ambiguity. Our proposed method achieves state-of-the-art performance on monocular depth estimation on Waymo Open and nuScenes Dataset with significant improvement in the depth of moving objects. Code and additional results are available at https://dynamo-depth.github.io.
翻译:无监督单目深度估计技术已展现出令人鼓舞的结果,但通常假设场景是静态的。当在动态场景中训练时,这些技术会受到影响——物体的表观运动既可以解释为物体自身的独立运动,也可以通过改变其深度来解释。这种歧义性导致深度估计器对运动物体预测出错误的深度。为解决该问题,我们提出Dynamo-Depth,这是一种统一方法,通过从未标注的单目视频中联合学习单目深度、三维独立流场和运动分割来消除动态运动歧义。具体而言,我们的核心洞见是:尽管存在根本性的潜在歧义,但良好的初始运动分割估计足以支持深度与独立运动的联合学习。所提出的方法在Waymo Open和nuScenes数据集上实现了单目深度估计的最优性能,尤其在运动物体的深度估计上取得显著提升。代码及更多结果可访问 https://dynamo-depth.github.io。