Depth perception is crucial for a wide range of robotic applications. Multi-frame self-supervised depth estimation methods have gained research interest due to their ability to leverage large-scale, unlabeled real-world data. However, the self-supervised methods often rely on the assumption of a static scene and their performance tends to degrade in dynamic environments. To address this issue, we present Motion-Aware Loss, which leverages the temporal relation among consecutive input frames and a novel distillation scheme between the teacher and student networks in the multi-frame self-supervised depth estimation methods. Specifically, we associate the spatial locations of moving objects with the temporal order of input frames to eliminate errors induced by object motion. Meanwhile, we enhance the original distillation scheme in multi-frame methods to better exploit the knowledge from a teacher network. MAL is a novel, plug-and-play module designed for seamless integration into multi-frame self-supervised monocular depth estimation methods. Adding MAL into previous state-of-the-art methods leads to a reduction in depth estimation errors by up to 4.2% and 10.8% on KITTI and CityScapes benchmarks, respectively.
翻译:深度感知对于广泛的机器人应用至关重要。多帧自监督深度估计方法因其能够利用大规模、无标签的真实世界数据而获得研究关注。然而,自监督方法通常依赖于静态场景的假设,其性能在动态环境中往往会下降。为了解决这个问题,我们提出了运动感知损失(MAL),它利用了连续输入帧之间的时序关系,以及多帧自监督深度估计方法中教师网络与学生网络之间的一种新颖的蒸馏方案。具体而言,我们将运动物体的空间位置与输入帧的时序顺序相关联,以消除由物体运动引起的误差。同时,我们改进了多帧方法中的原始蒸馏方案,以更好地利用教师网络的知识。MAL是一种新颖的即插即用模块,专为无缝集成到多帧自监督单目深度估计方法中而设计。将MAL添加到先前的最先进方法中,在KITTI和CityScapes基准测试上分别将深度估计误差降低了高达4.2%和10.8%。