In recent years, self-supervised monocular depth estimation has drawn much attention since it frees of depth annotations and achieved remarkable results on standard benchmarks. However, most of existing methods only focus on either daytime or nighttime images, thus their performance degrades on the other domain because of the large domain shift between daytime and nighttime images. To address this problem, in this paper we propose a two-branch network named GlocalFuse-Depth for self-supervised depth estimation of all-day images. The daytime and nighttime image in input image pair are fed into the two branches: CNN branch and Transformer branch, respectively, where both fine-grained details and global dependency can be efficiently captured. Besides, a novel fusion module is proposed to fuse multi-dimensional features from the two branches. Extensive experiments demonstrate that GlocalFuse-Depth achieves state-of-the-art results for all-day images on the Oxford RobotCar dataset, which proves the superiority of our method.
翻译:近年来,自监督单目深度估计因无需深度标注且在标准基准上取得显著成果而备受关注。然而,现有方法大多仅聚焦于白天或夜间图像,由于昼夜图像间存在巨大域偏移,其在另一域上的性能会显著下降。为解决这一问题,本文提出一种名为GlocalFuse-Depth的双分支网络,用于实现全天候图像的自监督深度估计。输入图像对中的白天与夜间图像分别输入CNN分支和Transformer分支,从而高效捕获细粒度细节与全局依赖关系。此外,本文提出一种新颖的融合模块,用于融合来自两个分支的多维特征。大量实验表明,GlocalFuse-Depth在Oxford RobotCar数据集的全天候图像上取得了最先进的结果,验证了本方法的优越性。