Self-supervised depth estimation draws a lot of attention recently as it can promote the 3D sensing capabilities of self-driving vehicles. However, it intrinsically relies upon the photometric consistency assumption, which hardly holds during nighttime. Although various supervised nighttime image enhancement methods have been proposed, their generalization performance in challenging driving scenarios is not satisfactory. To this end, we propose the first method that jointly learns a nighttime image enhancer and a depth estimator, without using ground truth for either task. Our method tightly entangles two self-supervised tasks using a newly proposed uncertain pixel masking strategy. This strategy originates from the observation that nighttime images not only suffer from underexposed regions but also from overexposed regions. By fitting a bridge-shaped curve to the illumination map distribution, both regions are suppressed and two tasks are bridged naturally. We benchmark the method on two established datasets: nuScenes and RobotCar and demonstrate state-of-the-art performance on both of them. Detailed ablations also reveal the mechanism of our proposal. Last but not least, to mitigate the problem of sparse ground truth of existing datasets, we provide a new photo-realistically enhanced nighttime dataset based upon CARLA. It brings meaningful new challenges to the community. Codes, data, and models are available at https://github.com/ucaszyp/STEPS.
翻译:摘要:自监督深度估计近期备受关注,因为它能提升自动驾驶车辆的3D感知能力。然而,该方法本质上依赖于光度一致性假设,这一假设在夜间条件下难以成立。尽管已有多种有监督夜间图像增强方法被提出,但这些方法在复杂驾驶场景中的泛化性能并不理想。为此,我们提出首个无需任何任务的真值标注、联合学习夜间图像增强器与深度估计器的方法。该方法通过我们新提出的不确定像素掩码策略,将两个自监督任务紧密耦合。此策略源于夜间图像不仅存在欠曝光区域,亦存在过曝光区域这一观察。通过将桥梁形曲线拟合至光照图分布,可同时抑制这两类区域,并自然建立两个任务之间的桥梁。我们在两个成熟数据集:nuScenes和RobotCar上对该方法进行了基准测试,并在两者上均展示了最先进的性能。详细的消融实验也揭示了本文方法的内部机制。最后,为缓解现有数据集稀疏真值的问题,我们基于CARLA提供一个新的经光度真实感增强的夜间数据集,为学界带来了有意义的新挑战。代码、数据及模型已开源至:https://github.com/ucaszyp/STEPS。