Depth perception is crucial for a wide range of robotic applications. Multi-frame self-supervised depth estimation methods have gained research interest due to their ability to leverage large-scale, unlabeled real-world data. However, the self-supervised methods often rely on the assumption of a static scene and their performance tends to degrade in dynamic environments. To address this issue, we present Motion-Aware Loss, which leverages the temporal relation among consecutive input frames and a novel distillation scheme between the teacher and student networks in the multi-frame self-supervised depth estimation methods. Specifically, we associate the spatial locations of moving objects with the temporal order of input frames to eliminate errors induced by object motion. Meanwhile, we enhance the original distillation scheme in multi-frame methods to better exploit the knowledge from a teacher network. MAL is a novel, plug-and-play module designed for seamless integration into multi-frame self-supervised monocular depth estimation methods. Adding MAL into previous state-of-the-art methods leads to a reduction in depth estimation errors by up to 4.2% and 10.8% on KITTI and CityScapes benchmarks, respectively.
翻译:深度感知在广泛机器人应用中至关重要。多帧自监督深度估计方法因其能够利用大规模无标注真实世界数据而受到研究关注。然而,自监督方法通常依赖于静态场景假设,其性能在动态环境中往往会下降。为解决该问题,我们提出运动感知损失(Motion-Aware Loss),该方法利用连续输入帧间的时间关系以及多帧自监督深度估计方法中教师网络与学生网络间的新型蒸馏方案。具体而言,我们将运动物体的空间位置与输入帧的时间顺序相关联,以消除物体运动引起的误差。同时,我们增强了多帧方法中的原始蒸馏方案,以更好地利用教师网络的知识。MAL是一种即插即用模块,可无缝集成到多帧自监督单目深度估计方法中。将MAL引入现有最优方法后,在KITTI和CityScapes基准上,深度估计误差分别降低最高达4.2%和10.8%。