Depth estimation plays an important role in the robotic perception system. Self-supervised monocular paradigm has gained significant attention since it can free training from the reliance on depth annotations. Despite recent advancements, existing self-supervised methods still underutilize the available training data, limiting their generalization ability. In this paper, we take two data augmentation techniques, namely Resizing-Cropping and Splitting-Permuting, to fully exploit the potential of training datasets. Specifically, the original image and the generated two augmented images are fed into the training pipeline simultaneously and we leverage them to conduct self-distillation. Additionally, we introduce the detail-enhanced DepthNet with an extra full-scale branch in the encoder and a grid decoder to enhance the restoration of fine details in depth maps. Experimental results demonstrate our method can achieve state-of-the-art performance on the KITTI benchmark, with both raw ground truth and improved ground truth. Moreover, our models also show superior generalization performance when transferring to Make3D and NYUv2 datasets. Our codes are available at https://github.com/Sauf4896/BDEdepth.
翻译:深度估计在机器人感知系统中扮演着重要角色。自监督单目方法因无需依赖深度标注数据即可完成训练而受到广泛关注。尽管近期取得了进展,现有自监督方法仍未能充分利用可用训练数据,限制了其泛化能力。本文采用两种数据增强技术——即缩放裁剪与分割置换——以充分挖掘训练数据集的潜力。具体而言,原始图像与生成的两种增强图像同时输入训练流程,并利用它们进行自蒸馏。此外,我们引入了细节增强型DepthNet,该网络在编码器中增设全尺度分支并采用网格解码器,以提升深度图中精细细节的恢复效果。实验结果表明,我们的方法在KITTI基准测试中,针对原始真实标注与改进真实标注均取得了最优性能。同时,在迁移至Make3D和NYUv2数据集时,我们的模型展现出卓越的泛化性能。相关代码已开源在https://github.com/Sauf4896/BDEdepth。